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4.  INTRODUCTION 


Temporal  change  of  mass  lesions  overtime  is  a  key  piece  of  information  in  computer-aided 
diagnosis  of  breast  cancer  and  treatment  monitoring,  the  purpose  of  the  project  is  to  develop  an  automatic 
change  detection  method  to  quantitatively  extract  the  clinically  important  changes  of  suspicious  lesions, 
upgrade  the  existing  CAD  system,  and  thus  improve  the  clinical  diagnosis  of  breast  cancer.  We  will  build 
a  site  model  for  each  individual  patient  for  monitoring  the  breast  tissue  changes  and  extend  our  current 
research  on  image  registration  and  change  detection  to  the  early  detection  of  breast  cancer.  Specific  aims 
include:  1)  registration  and  segmentation  of  deformable  breast  tissue  structures  across  a  series  of 
mammograms;  2)  construction  of  a  site  model  of  the  mammogram  for  individual  patients  showing  the 
locations  of  regions  of  interest  and  associated  diagnostic  information;  3)  identification  of  clinically 
significant  changes  in  both  global  and  local  mass  areas  within  the  breast;  and  4)  integration  and  evaluation 
of  the  developed  techniques  with  existing  CAD  prototype.  At  conclusion  of  this  project,  we  anticipate 
achieving  the  following:  1)  establish  a  reliable  technique  of  monitoring  breast  tissue  changes  associated 
with  cancerous  masses;  2)  deliver  a  CAD  prototype  that  can  incorporate  tissue  change  information  from 
additional  mammograms;  3)  evaluate  the  merit  of  combining  change  detection  and  CAD  for  improved 
clinical  diagnosis  using  multiple  mammograms;  and  4)  acquire  the  experience  necessary  to  explore 
multimodality  imaging  for  unified  detection,  diagnosis  and  treatment  assessment  of  breast  cancer. 


5.'BODY-Annual  Summary 

The  long-term  goal  of  this  career  development  project  is  to  develop  image  guided  diagnosis 
methodology  through  change  detection  in  mammogram  sequences  for  breast  cancer  detection.  The 
research  requires  the  knowledge  of  image  analysis,  image  registration,  change  detection,  and 
computational  intelligence. 

During  the  third  year  of  this  career  development  project,  I  continued  a  close  research  collaboration 
with  Dr.  Matthew  Freedman  (breast/lung  imaging)  at  Georgetown  University  Lombardi  Cancer  Center, 
and  further  developed  a  strategic  collaboration  with  Dr.  Zsolt  Szabo  (nuclear  medicine)  at  Johns  Hopkins 
Medical  Institutions  (JHMI).  Through  them,  I  have  learned  more  about  breast  cancer  at  both  imaging  and 
molecular  levels.  I  have  been  appointed  as  an  adjunct  assistant  professor  of  radiology  by  JHMI  and  served 
as  a  panel  member  for  evaluating  Molecular  Imaging  Center  Awards  (P20/P50)  and  Innovative  Molecular 
Analysis  Technologies  (R21/R33)  for  the  National  Cancer  Institute. 

I  continued  serving  as  the  director  of  Computational  Imaging  and  Informatics  Laboratory,  which 
currently  hosts  one  associate  professor,  one  assistant  professor,  one  visiting  professor,  and  five  full-time 
graduate  students.  Three  of  my  graduate  students  are  the  recipients  of  CDMRP-BCRP  Predoctoral 
Traineeships. 

I  have  been  promoted  to  the  rank  of  associate  professor  after  five-year  faculty  service  at  CUA. 

As  the  research  accomplishments  during  the  third  year,  I  have  first  identified  the  following  major 
research  tasks: 

1 .  Develop  a  working  algorithm  for  two-dimensional  registration  of  image  sequence  data  sets.  It  consists 
of  three  major  components:  (1)  control  points/objects  extraction;  (2)  mixture  of  probabilistic  principle 
axes  registration  (mPAR),  and  (3)  deformable  image  warping  through  multilayer  perceptron  (MLP) 
neural  network  computations. 

2.  Develop  new  pattern  recognition  algorithm  to  perform  computer-aided  diagnosis  for  mass  detection, 
where  the  lesion  candidates  are  firstly  identified  by  change  detection  suing  independent  component 
analysis. 

Follow  this  plan,  major  research  accomplishments  are  described  as  follows: 

5.1  Hybrid  non-rigid  image  registration  algorithm 

We  have  developed  a  neural  computation  based  non-rigid  registration  methodology  using  multiple 
rigid  transforms,  in  a  piece-wise  fashion,  to  model  the  registration  process  between  images  in  a  sequence. 
The  registration  methodology  is  a  hybrid  approach  that  combines  registration  without  exact  point 
correspondence  via  multi-object  principal  axes,  and  registration  with  point  correspondence  via  polynomial 
transform.  Neural  computation  is  used,  for  the  first,  to  combine  the  derived  individual  principal  axes 
solutions  for  each  object  in  a  committee  machine  formulation,  and  to  obtain  the  polynomial  transform 
based  on  extracted  control  points  using  a  multilayer  perceptron  (MLP). 

In  our  method,  we 
present  a  neural 
computation  based  non- 
rigid  registration  using 
piece-wise  rigid 

transformation.  The  novel 
feature  is  to  align  two  point 
sets  without  needing  to 
establish  explicit  point 
correspondences,  where  the  derivation  is  realized  by  minimizing  the  relative  entropy  between  the  two 
point  distributions  resulting  in  a  maximum  likelihood  estimate  of  the  transformation  matrix.  A  committee 
machine  approach  is  used  for  recovering  the  transformational  geometry  of  the  non-rigid  structures.  That  is 


Figure  1.  Demonstration  of  mPAR  method  with  simulated  data  set.  The  performance 
metric  is  mean  square  error  (MSE)  of  the  non-control  object.  Several  runs  of  this 
example  were  conducted  to  average  out  randomness  of  the  data  points.  The  results 
show  a  MSE  of  7.6%  between  the  final  image  and  original. 


rather  than  using  a  single  transformation  matrix  which  gives  rise  to  a  large  registration  error,  we  attempt 
to  interpolatively  apply  a  mixture  of  transformations.  By  further  generalizing  PAR  to  a  finite  mixture 
registration  (mPAR)  scheme,  with  a  soft  partitioning  of  the  data  set,  the  mixture  is  fit  using  expectation- 
maximization  (EM) 

algorithm.  We  then  applied 
a  probabilistic  adaptive 
principal  components 
extraction  (PAPEX) 

algorithm,  to  estimate  the 
transformational  of  the 
orthogonal  set  of 
eigenvalues  and 

eigenvectors  of  the  auto¬ 
covariance  matrix.  By 
applying  a  committee  machine  to  a  non-rigid  registration,  using  FMR  as  the  experts  and  PAPEX  as  a 
gating  function,  we  can 
acquire  the  registration 
based  on  a  mixture  of 
piece-wise  transformations 
of  the  data  set.  Then  the 
correspondences  control 
points  are  obtained.  As  a 
final  step,  the  warped 
image  is  obtaining  using 
the  neural  network  based 
non-linear  mapping,  to 
obtain  the  polynomial 
transform  based  on 
extracted  control  points 
using  MLP. 

Three  examples  are 
presented  to  demonstrate 
the  techniques  involved  in  the  process.  The  first  example  uses  four  Gaussian  clusters  and  focuses  on  the 
combination  of  the  multiple  transforms  into  a  composite  transform  using  finite  mixture  modeling 
techniques.  The  next  examples  present  the  complete  process  for  prostate  cancer  registration  and  breast 
sequence  analysis  respectively.  To  verify  performance,  the  results  are  compared  to  non-neural  based 
implementations  and  other  existing  registration  methods.  The  major  part  of  the  work  was  published  in 
IEEE  Proceedings  of  Neural  Network  for  Signal  Processing,  2001  (attached). 

5.2  Feature-Based  Computer-Aided  Detection  of  Masses 

In  the  clinical  course  of  detecting  masses, 
mammographers  usually  evaluate  the  surrounding 
background  of  a  radiodense  when  breast  cancer  is 
suspected.  In  this  study,  we  adapted  this  fundamental 
concept  and  computed  features  of  the  suspicious  region  in 
radial  sections.  These  features  were  then  arranged  by 
circular  convolution  processes  within  a  neural  network, 
which  led  to  an  improvement  in  detecting  mammographic 
masses. 

In  this  study,  randomly  selected  mammograms 
were  processed  by  morphological  enhancement  techniques. 

Radiodense  areas  were  isolated  and  delineated  using  a  region  growing  algorithm  with  a  valley  blocking 


Figure  4.  Difference  images  after  MLP  (right) 
and  TPS  (left)  image  warping. 


Figure  3.  In  this  example,  the  committee  machine  is  used  to  obtain  an  initial 
registration  using  multiple  extracted  objects  (skin-line,  dense  tissue  regions)  in  a  finite 
mixture  scheme.  Then  MLP  was  used  to  determine  the  coefficients  of  a  polynomial 
transform.  From  visual  inspection  we  see  that  by  using  MLP  the  most  of  the  scale 
different  between  images  has  been  corrected,  while  the  TPS  distorts  the  image. 


Figure  2.  Demonstration  of  mPAR  method  with  real  prostate  data  set  which  contains  a 
precise  probabilistic  map  of  prostate  tumor  distribution  and  the  corresponding  anatomic 
structure  of  a  prostate. 


tefchnique.  The  boundary  of  each  region  of  interest  was  then  divided  into  36  sectors  using  36  equi-angle 
dividers  radiated  from  the 
center  of  the  area.  Four 
features  at  each  section  were 
computed:  (1)  the  radius,  (2) 
the  normal  angle  of  the 
boundary,  (3)  the  average 
gradient  along  the  radial 
direction,  and  (4)  the  gray 
value  difference  (i.e.,  contrast) 
along  the  radial  direction. 

Hence,  144  computed  features 
(i.e.,  4  features  per  sector  for 
36  sectors)  were  used  as  input 
values  for  the  newly  invented 
multiple  circular  path  neural 
network  (MCPNN).  The 
neural  network  is  constructed 
to  emphasize  on  the 
correlation  information 

associated  with  the  feature 
interactions  within  the  angle 
and  between  adjacent  angles. 

We  have  tested  this 

approach  on  our  research  database  consisting  of  91  mammograms.  The  over-all  performance  in  the 
detection  of  masses  was  0.78-0.80  for  the  areas  (Az)  under  the  ROC  curves  using  the  conventional  neural 
network.  Later,  the  performance  was  improved  to  Az  values  of  0.84-0.89  using  the  multiple  circular  path 
neural  networks.  The 

major  part  of  the  work 
was  published  in  IEEE 
Transactions  on  Medical 
Imaging,  2001  (Two 
regular  papers  and  one 
Technical  Report  are 
attached). 

5.3  Key  results  and 
discussion 

During  the  second  year  of  the  project,  we  worked  on  the  theoretical  concepts  and  methods  of  a 
neural  computation  based  non-rigid  registration  algorithm.  The  approach  uses  a  committee  machine  to 
recover  the  total  transformational  geometry  of  the  non-rigid  object  using  multiple  rigid  transforms 
combined  together  in  a  finite  mixture  registration  scheme  (mPAR).  Finite  mixture  transform  combination 
is  a  novel  technique  to  combine  multiple  transforms  that  are  contained  in  a  single  image.  Different  from 
local  transforms,  no  other  method  combines  multiple  transforms  together.  In  addition,  finite  mixture 
combinations,  yields  a  lower  MSE  than  the  local  transform  and  results  in  a  smooth  image  while  local 
transforms  yield  an  image  containing  discontinues  on  transform  boundaries.  Furthermore,  the  registration 
obtained  in  the  committee  machine  is  fine  tuned  using  a  non-linear  transform  generated  by  a  MLP 
network  using  extracted  control  points.  To  the  best  of  our  knowledge,  we  may  be  the  first  group  to 
develop  neural  networks  based  image  registration  techniques  for  nonlinear  warping  of  image 
sequences. 
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Figure  6.  A  system  flowchart  for  the  detection  of  masses  in  this  study. 
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(A)  (B)  (C) 

Figure  5.  Three  types  of  network  paths  connecting  the  input  and  the  hidden  layers: 
(1)  Full  connection.  (2)  A  self  correlation  (SC)  path;  each  node  on  the  layer 
connects  to  a  single  set  of  the  features  (l,a,g,c)  for  the  fan-in  and  fully  connects  to 
the  hidden  nodes  for  fan-out.  (C)  A  neighborhood  correlation  (NC)  path;  each  node 
on  the  layer  connects  to  five  adjacent  sets  of  the  features  for  the  fan-in  and  fully 
connects  to  the  hidden  nodes  for  fan-out.  Note  that  the  fan-in  nets  emphasizing  self 
correlation  in  (2)  and  neighborhood  correlation  in  (3)  represent  convolution 
weights  (i.e.,  the  same  type  of  sectors  possess  the  same  set  of  weighting  factors). 


'  We  applied  our  new  algorithms  to  prostrate  and  breast  registration  problems  with  reasonable 
results  as  shown  in  the  previous  figures.  Some  distortion  can  be  seen  in  the  final  warped  images  because 
of  the  error  in  control 
point  selection  and 
correspondence. 

Improvement  is  this 
portion  should  decrease 
distortion  and  yield  a 
smoother  looking  image. 

Using  neural  networks 
in  this  problem  has 
increased  the  generality 
of  this  approach  by 
allowing  the  algorithm 
to  adjust  performance  as 
imaging  condition 
change. 
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Figure  7.  A  schematic  diagram,  showing  the  MCPNN  and  sector  features  of  masses,  was 
used  in  the  following  study. 


In  neural  network  based  mass  detection,  we  randomly  selected  54  computer-segmented  areas 
where  30  patches  were  matched  with  the  radiologist’s  identification  and  24  were  not.  This  database  was 
used  to  train  two  neural  network  systems:  (1)  a  conventional  3-layer  BP  neural  network  (with  125  nodes 
in  the  hidden  layer)  and  (2)  the  proposed  MCPNN  training  method  using  the  same  neural  network 
learning  algorithm.  The  structure  of  the  MCPNN  was  described  earlier.  However,  we  used  one  fully 
connected  path,  four  SC  paths,  four  20°  NC  paths,  four  30°  NC  paths,  three  40°  NC  paths,  and  two  50°  NC 
paths  in  the  first  step  network  connection  for  the  MCPNN.  All  paths  in  the  neural  network  have  their 
hidden  layers.  Only  one 
hidden  layer  per  path  was 
used.  Both  neural 
network  systems  were 
trained  by  the  error  back 
propagation  algorithm  by 
feeding  the  features  from 
the  input  layer  and 
registering  the 

corresponding  target 
value  at  the  output  side. 

Once  the  training  of  the 
neural  networks  was 
complete,  we  then  used 
the  remaining  71 
computer  segmented 
areas  for  the  testing. 

None  of  the  images  and 
their  corresponding 
patients  in  the  testing  set 

could  be  found  in  the  training  set.  The  neural  network  output  values  were  fed  into  the  LABROC  program 
for  the  performance  evaluation.  The  results  indicated  that  the  areas  (Az)  under  the  receiving  operator 
characteristic  (ROC)  curves  were  0.781  and  0.844  using  the  conventional  BPNN  and  the  MCPNN, 
respectively.  The  ROC  curves  of  these  two  neural  network  training  methods  are  shown  in  Figure  8  (left). 
We  also  invited  another  senior  mammographer  to  conduct  an  ROC  observer  study.  The  mammographer 
was  asked  to  rate  each  patch  using  a  numerical  scale  ranging  0-10  for  its  likelihood  of  being  a  mass. 
These  71  numbers  were  also  fed  into  the  LABROC  program.  The  mammographer’ s  performance  in  Az 
on  this  set  of  test  cases  was  0.909.  The  corresponding  ROC  curve  is  also  shown  in  Figure  8  (left). 


ROC  Curves  of  The  Mammographer  and 
Two  Different  Neural  Network  Training 
Methods  in  Experiment  1. 
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ROC  Curves  of  The  Two  Different 
Training  Methods  in  Experiment  2. 
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Figure  8.  The  ROC  curves  obtained  from  corresponding  experiments.  (A)  The  left  figure 
shows  that  the  performance  of  MCPNN  training  method  is  superior  to  that  of  the 
conventional  input  method.  The  highest  curve  is  the  ROC  performance  of  the  senior 
mammographer.  (B)  The  right  figure  shows  the  ROC  results  with  higher  performance 
using  the  leave-one-case-out  procedure  as  described  in  Experiment  2. 


' '  We  also  conducted  a  leave-one-case-out  experiment  using  the  same  database.  In  this  experiment, 
we  used  those  patches  extracted  from  90  mammograms  for  the  training  and  used  the  patches  (most  of 
them  are  single)  extracted  from  the  remaining  one  mammogram  as  test  objects.  The  procedure  was 
repeated  91  times  to  allow  every  suspicious  patch  from  each  mammogram  to  be  tested  in  the  experiment. 
For  each  individual  suspicious  area,  the  computed  features  were  identical  to  those  used  in  Experiment  1 . 
Again,  both  neural  network  systems  were  independently  evaluated  with  the  same  procedure.  The  results 
indicated  that  the  Az  values  were  0.799  and  0.887  using  the  conventional  back  propagation  neural  network 
and  the  MCPNN,  respectively.  Figure  8  (right)  shows  the  ROC  curves  of  these  two  neural  network 
systems  using  the  leave-one-of-out  procedure  in  the  experiment. 

Through  this  study,  we  found  that  the  selected  features  are  somewhat  effective  in  the  detection  of 
masses.  These  features  were  “computationally  translated”  from  the  qualitative  descriptors  of  BI-RAD. 
Another  uniqueness  of  this  study  was  on  the  test  of  our  newly  developed  MCPNN  training  method.  In 
Experiment  1,  we  found  that  the  performances  of  both  neural  network  systems  were  increased  as 
compared  to  existing  algorithms.  This  might  be  due  to  the  increased  number  of  cases  (from  54  to  124)  in 
the  training  set.  In  Experiment  2,  the  Az  value  was  improved  by  0.043  using  the  MCPNN  training  method 
that  was  higher  than  Az  difference  of  0.018  obtained  by  the  conventional  training  method.  The  results 
implied  that  the  MCPNN  learned  more  effectively  than  the  conventional  MCPNN-BP  when  the  number 
of  training  cases  was  increased. 

It  is  known  in  the  field  of  artificial  intelligence  that  the  key  factors  in  pattern  recognition  are:  (1) 
effective  methods  in  the  extraction  of  features  and  (2)  analytic  methods  (e.g.,  back  propagation  neural 
network)  for  the  extracted  features.  In  this  study,  we  showed  that  the  training  method  designed  to  guide 
the  analyzer  is  also  an  important  factor  to  a  success  of  a  pattern  recognition  task.  Though  this  finding  is 
not  new,  the  research  of  developing  training  methods  for  various  pattern  recognition  tasks  has  not 
established  in  the  field  of  medical  imaging.  In  this  work,  we  demonstrated  that  organized  features  with 
proper  network  connection  and  task-oriented  guidance  would  assist  the  neural  network  in  performing  the 
task. 


As  far  as  the  research  in  recognition  of  masses  is  concerned,  we  believe  that  main  concept  of  using 
sectors  is  an  effective  approach.  Note  that  any  features  arranged  in  the  polar  coordinate  system  can  be 
trained  by  the  MCPNN  method.  Since  the  MCPNN  only  coordinates  the  input  data,  the  internal  neural 
network  learning  algorithm  can  be  changed  to  other  learning  algorithms.  A  technique  using  the  rubber 
band  straightening  transformation,  independently  developed  by  Sahnier  et  al.,  for  the  detection  of  masses 
also  employs  a  similar  concept  in  extracting  feature  and/or  texture  in  the  polar  coordinate.  We  believe 
that  integration  of  effective  feature  and  texture  values  computed  at  small  sectors  will  be  the  research  trend 
in  mass  detection.  This  paper  focuses  on  neural  network  design  and  arrangement  of  features  for  effective 
pattern  recognition  of  masses  in  medical  imaging. 


6.  APPENDICES 

6.1  Key  Research  Accomplishments 

•  We  have  proposed  and  developed  mixture  of  principal  axes  registration  (mPAR)  algorithm  to  perform 
initial  non-rigid  image  warping,  without  control  point  correspondence.  A  working  computer  code  has 
been  implemented.  This  work  is  original  and  new. 

•  We  have  proposed  and  developed  multilayer  perceptron  (MLP)  neural  computation  to  perform  final 
non-rigid  image  warping,  with  initially  matched  control  points  correspondence.  A  working  computer 
code  has  been  implemented.  This  work  is  original  and  new. 

•  Our  numerical  evaluation  has  shown  that  new  image  registration  outperform  existing  popular  methods 
such  as  Thin-Plate  Spline  (TSP)  algorithm. 

•  We  have  proposed  and  developed  multiple  circular  path  neural  network  (MCPNN)  classifier  to 
perform  computer-aided  mass  detection,  against  other  existing  methods.  We  have  implemented  related 
algorithms.  This  work  is  original  and  new. 

•  We  have  tested  our  detection  system  using  receiver  operating  characteristics  (ROC)  analysis.  Our 
preliminary  experiment  has  shown  that  new  system  outperform  existing  popular  methods  such  as 
conventional  backpropogation  (BP)  neural  networks. 

6.2  Reportable  Outcomes 
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Abstract— This  paper  presents  a  statistical  model  supported 
approach  for  enhanced  segmentation  and  extraction  of  suspicious 
mass  areas  from  mammographic  images.  With  an  appropriate 
statistical  description  of  various  discriminate  characteristics 
of  both  true  and  false  candidates  from  the  localized  areas,  an 
improved  mass  detection  may  be  achieved  in  computer-assisted 
diagnosis  (CAD).  In  this  study,  one  type  of  morphological  oper¬ 
ation  is  derived  to  enhance  disease  patterns  of  suspected  masses 
by  cleaning  up  unrelated  background  clutters,  and  a  model-based 
image  segmentation  is  performed  to  localize  the  suspected  mass 
areas  using  stochastic  relaxation  labeling  scheme.  We  discuss  the 
importance  of  model  selection  when  a  finite  generalized  Gaussian 
mixture  is  employed,  and  use  the  information  theoretic  criteria  to 
determine  the  optimal  model  structure  and  parameters.  Examples 
are  presented  to  show  the  effectiveness  of  the  proposed  methods 
on  mass  lesion  enhancement  and  segmentation  when  applied  to 
mammographical  images.  Experimental  results  demonstrate  that 
the  proposed  method  achieves  a  very  satisfactory  performance  as 
a  preprocessing  procedure  for  mass  detection  in  CAD. 

Index  Terms — Finite  mixture,  image  enhancement,  image  seg¬ 
mentation,  information  criterion,  morphological  filtering,  relax¬ 
ation  labeling. 


I.  Introduction 

IN  RECENT  years,  several  computer-assisted  diagnosis 
(CAD)  schemes  for  mass  detection  and  classification 
have  been  developed  [1]-[13].  Though  it  may  be  difficult  to 
compare  the  relative  performance  of  these  methods,  because 
the  reported  performance  strongly  depends  on  the  degree  of 
subtlety  of  masses  in  the  selected  database,  accurate  selection 
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of  suspected  masses  is  considered  a  critical  and  first  step  due 
to  the  variability  of  normal  breast  tissue  and  the  lower  contrast 
and  ill-defined  margins  of  masses  [3],  [6],  and  since  no  subtle 
masses  should  be  missed  before  any  further  analysis. 

A  number  of  image  processing  techniques  have  been  pro¬ 
posed  to  perform  suspicious  mass  site  selection.  Kobatake  et  al 
[1]  proposed  using  a  iris  filter  to  detect  tumors  as  suspicious  re¬ 
gions  with  very  weak  contrast  to  their  background.  Sameti  et  al. 
[7]  used  fuzzy  sets  to  partition  the  mammographic  image  data. 
Lau  and  Yin  et  al  independently  proposed  using  bilateral-sub- 
traction  to  determine  possible  mass  locations  [9],  [13].  Some 
other  investigators  proposed  using  pixel-based  feature  segmen¬ 
tation  of  spiculated  masses  [4],  [8].  Kegelmeyer  has  reported 
promising  results  for  detecting  spiculated  tumors  based  on  local 
edge  characteristics  and  Laws  texture  features  [8].  Karssemeijer 
et  al  [4]  proposed  to  identify  stellate  distortions  by  using  the 
orientation  map  of  line-like  structures.  Recently,  Petrick  et  al 
[6]  proposed  a  two-stage  adaptive  density-weighted  contrast  en¬ 
hancement  filtering  technique  along  with  edge  detection  and 
morphological  feature  classification  for  automatic  segmentation 
of  potential  masses.  Kupinski  and  Giger  [3]  presented  a  radial 
gradient  index-based  algorithm  and  a  probabilistic  algorithm  for 
seeded  lesion  segmentation. 

Nevertheless,  to  our  best  knowledge,  few  work  has  been  ded¬ 
icated  to  improve  the  task  of  lesion  site  selection  although  it  is 
indeed  a  very  crucial  step  in  CAD.  Especially,  few  studies  have 
used  and  justified  model-based  image  processing  techniques  for 
unsupervised  lesion  site  selection  [11].  Zwiggelaar  et  aldevel- 
oped  a  statistical  model  to  describe  and  detect  the  abnormal  pat¬ 
tern  of  linear  structures  of  spiculated  lesions  [2].  In  their  work, 
the  probability  density  function  of  the  observation  vectors  for 
each  class  is  assumed  to  be  normal,  we  have  experienced  that 
the  “normal”  distribution  for  each  class  is  nor  true.  Li  et  al  pro¬ 
posed  using  a  Markov  random  field  model  to  extract  suspicious 
masses  for  mass  detection  [11].  In  their  study,  most  of  model 
parameters  were  chosen  empirically,  and  the  mammogram  was 
segmented  into  three  regions  (background,  fat,  and  parenchymal 
or  tumors). 

Stochastic  model-based  image  segmentation  is  a  technique 
for  partitioning  an  image  into  distinctive  meaningful  regions 
based  on  the  statistical  properties  of  both  gray  level  and  context 
images.  A  good  segmentation  result  would  depend  on  suitable 
model  selection  for  a  specific  image  modality  [16],  [17]  where 
.model  selection  refers  to  the  determination  of  both  the  number 
of  image  regions  and  the  local  statistical  distributions  of  each 
region.  Furthermore,  a  segmentation  result  would  be  improved 
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Fig.  1 .  Major  components  in  CAD. 

with  preenhanced  pattern  of  interest  being  segmented.  The  only 
assumption  for  suspected  mass  site  selection  is  that  suspected 
mass  areas  should  be  brighter  than  the  surrounding  breast  tissues 
which  is  valid  for  most  of  the  real  cases.  When  some  masses 
lie  either  within  an  inhomogeneous  pattern  of  fibroglandular 
tissue  or  are  partially  or  completely  surrounded  by  fibroglan¬ 
dular  tissue,  enhancement  of  mass-related  signals  is  important. 

Fig;  1  shows  a  general  block  diagram  of  CAD  systems.  This 
paper  focuses  on  “image  processing”  block,  to  just  automati¬ 
cally  pick  up  all  possible  lesion  sites.  We  aim  on  two  essential 
issues  in  the  stochastic  model-based  image  segmentation;  en¬ 
hancement  and  model  selection.  Based  on  the  differential  geo¬ 
metric  characteristics  of  masses  against  the  background  tissues, 
we  propose  one  type  of  morphological  operation  to  enhance  the 
mass  patterns  on  mammograms.  Then  we  employ  a  finite  gen¬ 
eralized  Gaussian  mixture  (FGGM)  distribution  to  model  the 
histogram  of  the  mammograms  where  the  statistical  properties 
of  the  pixel  images  are  largely  unknown  and  are  to  be  incor¬ 
porated.  We  incorporate  the  EM  algorithm  with  two  informa¬ 
tion  theoretic  criteria  to  determine  the  optimal  number  of  image 
regions  and  the  kernel  shape  in  the  FGGM  model.  Finally,  we 
apply  a  contextual  Bayesian  relaxation  labeling  (CBRL)  tech¬ 
nique  to  perform  the  selection  of  suspected  masses.  The  major 
differences  of  our  work  from  the  previous  work  [l]-[6],  [8]— [13] 
are  as  follows. 

1)  We  present  a  new  algorithm  of  morphological  filtering 
for  image  enhancement  in  which  the  combined  operations 
are  applied  to  the  original  gray  tone  image  and  the  higher 
sensitive  lesion  site  selection  of  the  enhanced  images  are 
observed. 

2)  We  justify  and  pilot  test  the  FGGM  distribution  in  mod¬ 
eling  mammographic  pixel  images  together  with  a  model 
selection  procedure  based  on  the  two  information  theo¬ 
retic  criteria.  This  allows  an  automatic  identification  of 
both  the  number  ( K )  and  kernel  shape  (a)  of  the  distri¬ 
butions  of  tissue  types. 

3)  We  develop  a  new  algorithm  (CBRL)  for  segmenting 
mass  areas  where  the  comparable  results  are  achieved 
as  those  using  Markov  random  field  model-based 
approaches  while  with  much  less  computational  com¬ 
plexity. 

The  presentation  of  this  paper  is  organized  as  follows.  In  Sec¬ 
tion  II,  the  proposed  dual  morphological  operation  enhancement 
technique  is  described  in  detail.  The  theory  and  algorithm  on 


FGGM  modeling,  model  selection,  and  parameter  estimation 
are  presented  in  Section  III.  This  is  followed  by  a  discussion 
on  the  selection  of  suspicious  masses  using  the  CBRL  approach. 
Evaluation  results  are  given  and  discussed  in  Section  IV.  Finally, 
the  paper  is  concluded  by  Section  V. 

n.  Morphological  Enhancement 

One  of  the  main  difficulties  in  suspicious  mass  segmentation 
is  that  mammographic  masses  are  often  overlapped  with  dense 
breast  tissues.  Therefore,  it  is  necessary  to  remove  bright  back¬ 
ground  caused  by  dense  breast  tissues  while  preserving  the  fea¬ 
tures  and  patterns  related  to  the  masses.  For  this  purpose,  back¬ 
ground  correction  is  an  important  step  for  mass  segmentation. 
We  propose  a  mass  pattern-dependent  background  removal  ap¬ 
proach  using  morphological  operations. 

A.  Morphological  Filtering  Theory 

Morphological  operations  can  be  employed  for  many  image 
processing  purposes,  including  edge  detection,  region  segmen¬ 
tation,  and  image  enhancement.  The  beauty  and  simplicity  of 
mathematical  morphology  approach  come  from  the  fact  that  a 
large  class  of  filters  can  be  represented  as  the  combination  of 
two  simple  operations:  erosion  and  dilation.  Let  Z  denote  the 
set  of  integers  and  / (z,  j)  denote  a  discrete  image  signal,  where 
the  domain  set  is  given  by  {z,  j}  e  'N\  x  N2}  Nj  x  N2  C  Z2 
and  the  range  set  by  {/}  6  iV3,  iV3  c  Z.  A  structuring  element 
B  is  a  subset  in  Z 2  with  a  simple  geometrical  shape  and  size. 
Denote  Bs  =  {-6  :  b  e  B}  as  the  symmetric  set  of  B  and 
Btut2  as  the  translation  of  B  by  (*i,  t2),  where  (ti,  t2)  €  Z2. 
The  erosion  /  ©  Bs  and  dilation  /©Bs  can  be  expressed  as 
[19] 

(feBs)(iJ)=  min  (f(tut2))  (1) 

*lt  j 

(/  ©  Bs)(i,  j )  =  max  (/(*!,  t2)).  (2) 

On  the  other  hand,  opening  foB  and  closing  /  •  B  are  defined 
as  [19] 

(foB)(iJ)=((feB*)®B)(iJ)  (3) 

(f*B)(iJ)=((f®Bs)eB)(iJ).  (4) 

A  gray  value  image  can  be  viewed  as  a  two-dimensional  sur¬ 
face  in  a  threerdimensional  space.  Given  an  image,  the  opening 
operation  removes  the  objects,  which  have  size  smaller  than  the 
structuring  element,  with  positive  intensity.  Thus,  with  the  spec¬ 
ified  structuring  element,  one  can  extract  different  image  con¬ 
texts  by  taking  the  difference  between  the  original  and  opening 
processed  image,  which  is  known  as  “tophat”  operation  [19]. 

B.  Morphological  Enhancement  Algorithms 

Based  on  the  properties  of  morphological  filters,  we  designed 
one  type  of  mass  pattern-dependent  enhancement  approaches. 
The  algorithm  is  implemented  by  dual  morphological  tophat  op¬ 
erations  following  by  a  subtraction  which  is  described  as  fol¬ 
lows. 
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Step  1)  The  textures  without  the  pattern  information  of  in¬ 
terest  are  extracted  by  a  tophat  operation 

ri(i,  j )  =  max(0,  [/(*',  j)  -  (/  o  Si)(i,  j)])  (5) 

where  /(«,  j)  is  the  original  image,  and  ri(i,  j )  is 
the  residue  image  between  the  original  image  and  the 
opening  of  the  original  image  by  a  specified  struc¬ 
turing  element  B\ .  The  size  of  B\  should  be  chosen 
smaller  than  the  size  of  masses. 

Step  2)  Let  r2(i,  j)  be  the  mass  pattern  enhanced  image  by 
background  correction,  i.e.,  by  the  second  tophat  op¬ 
eration  on 

ri (*,  j)  =  max(0,  [f(i,  j )  -  (/  o  j )])  (6) 

where  B2  is  a  specified  structuring  element  which 
has  a  larger  size  than  masses. 

Step  3)  The  enhanced  image  /i(i,  j )  can  be  derived  as 


fi(i,  j)  =  max(0,  [r2(i,  j )  -  rfyi,  j)]).  (7) 

This  operation  is  called  “dual  morphological  operation.”  It 
can  remove  the  background  noise  and  the  structure  noise  inside 
the  suspected  mass  patterns.  Fig.  2  shows  the  mass  patch  and 
the  enhanced  results  of  each  step  using  the  dual  morphological 
operation..  As  we  can  see  from  Fig.  2,  both  background  correc¬ 
tion  [Fig.  2(c)]  and  dual  morphological  operation  [Fig.  2(d)]  en¬ 
hanced  the  mass  pattern,  but  dual  morphological  operation  re¬ 
moved  more  structural  noise  inside,  the  mass  region  which  in 
turn  would  improve  the  mass  segmentation  results. 

EL  Model-Based  Segmentation 
A.  Statistical  Modeling 

Given  a  digital  image  consisting  of  N\  x  N2  pixels,  assume 
this  image  contains  K  regions.  By  randomly  reordering  all 
pixels  in  the  underlying  probability  space,  one  can  treat  pixel 
labels  as  random  variables  and  introduce  a  prior  probability 
measure  71**..  Then  the  FGGM  probability  density  function  (pdf) 
of  gray  level  of  each  pixel  is  given  by  [17] 

K 

p(xi)  =  Y^nkPk(xi),  i  =  1, . . . ,  iVii\r2, 

fc=i 

*<  =  0,  1,  1  (8) 


where  x,  is  the  gray  level  of  pixel  i,  and  L  is  the  number  of  gray 
levels.  pk(xi)s  are  conditional  region  pdfs  with  the  weighting 
factor  irk,  satisfying  irk  >  0.  and  irk  =  1.  The  general¬ 
ized  Gaussian  pdf  given  region  k  is  defined  by 


Pk(Xi)  = 


aPk 

2T(l/a) 


exP[—  \Pk{xi  —  Pk)\a]  >  a>0, 


T(3/a) 


(7k  lT(l/a)J 


1/2 


(9) 


where  pk  is  the  mean,  r(-)  is  the  Gamma  function.  0k  is  a  pa¬ 
rameter  related  to  the  variance  ak ■  It  can  be  shown  that  when 


(a)  (b) 


(c)  (d) 


Fig.  2.  Original  and  enhancement  result  of  the  mass  patch  using 
dual-morphological  operation,  (a)  Original  image  block  /(i,  j).  (b) 
Textures  rx(i,  j).  (c)  Background  correction  result  r2(t,  j).  (d)  Enhanced 
-result /j(i,j). 


a  =  2.0,  one  has  the  Gaussian  pdf;  when  a  =  1.0,  one  has  the 
Laplacian  pdf.  When  a  >  1,  the  distribution  tends  to  a  uniform 
pdf;  when  a  <1,  the  pdf  becomes  sharp.  Therefore,  the  gener¬ 
alized  Gaussian  model  is  a  suitable  model  to  fit  the  histogram 
distribution  of  those  images  whose  statistical  properties  are  un¬ 
known  since  the  kernel  shape  can  be  controlled  by  selecting  dif¬ 
ferent  a  values. 

The  whole  image  can  be  well  approximated  by  an  in¬ 
dependent  and  identically  distributed  random  field  X.  The 
corresponding  joint  pdf  is 

NiN2  K 

-  n  e  KkPkfa)  (10) 

t=i  *=i 

where  x  =  [xi,  rc2)  . . . ,  ^  x  G  X.  pk(£i)  is  given 

in  (9).  Based  on  the  joint  probability  measure  of  pixel  images, 
the  likelihood  function  under  FGGM  modeling  can  be  expressed 
as  £(r)  =  nilf2  Prfa)  where  r  iK>  Pi b,  o-fc,  k  = 

1,  . . .  ,  K}  denotes  the  model  parameter  set. 

B.  Model  Identification 

With  an  appropriate  system  likelihood  function,  the  objective 
of  model  identification  is  to  estimate  the  model  parameters  by 
maximizing  the  likelihood  function,  or  equivalently  minimizing 
the  relative  entropy  between  the  image  histogram  px(w)  and 
the  estimated  pdf  pT(u ),  where  u  is  the  gray  level.  Based  on 
the  FGGM  model,  the  EM  algorithm  is  applied  to  estimate  the 
model  parameters.  The  EM  algorithm  is  an  iterative  technique 
for  maximum-likelihood  (ML)  estimation  [20].  Recently,  it  has 
been  used  in  many  medical  imaging  applications  [15].  Instead 
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of  evaluating  directly  the  value  of  ML,  we  use  the  global  rel¬ 
ative  entropy  (GRE)  between  the  histogram  and  the  estimated 
FGGM  distribution  to  measure  the  performance  of  parameter 
estimation,  given  by 

GRE(px||Pr)  =  £  Px(u)  log  (11) 

Motivated  by  the  same  spirit  of  conventional  EM  algorithm 
for  finite  normal  mixtures  (FNMs),  we  formulated  the  EM  al¬ 
gorithm  to  estimate  the  parameter  values  of  the  FGGM.  The  al¬ 
gorithm  is  summarized  as  follows. 

EM  Algorithm: 

1)  For  Ot  =  GJxninj  •  •  ■  j  ^max 

•  m  =  0,  given  initialized  r<°) 

•  E-step:  for  i  =  1,  . . . ,  NtN2i  k  =  i 
compute  the  probabilistic  membership 

_(m)  _ 

Zik  ~  - •  (12) 

J2^Pk(Xi) 

k  —  l 

•  M-step:  for  k  =  1,  . , . ,  K,  compute  the  updated 
parameter  estimates 


»i”+i> = 


2(m+l) 


1  n,n2 


Ki  N,x'" 


J  ) 

•  When  |GRE(m\px||pr)  -  GRE(m+1>(px||pr)|  <  e 
is  satisfied,  go  to  Step  2  Otherwise,  m  =  m  + 1  and 
go  to  E-Step. 

2)  Compute  GRE,  and  go  to  Step  1. 

3)  Choose  the  optimal  r  which  corresponds  to  the  minimum 
GRE. 

As  we  mentioned  in  Section  I,  the  two  important  parameters 
in  model  selection  are  K  and  a.  Determination  of  the  region 
parameter  K  directly  affects  the  quality  of  the  resulting  model 
parameter  estimation  and  in  turn,  affects  the  result  of  segmen¬ 
tation.  In  this  paper  we  propose  an  approach  to  determine  the 
value  of  K  based  on  two  popular  information  theoretic  criteria 
introduced  by  Akaike  [23]  and  by  Rissanen  [24].  Akaike  pro¬ 
posed  to  select  the  model  that  gives  the  minimum  Akaike  infor¬ 
mation  criterion  (AIC),  defined  by 

AlC(if)  =  -21og(£(rML))  +  2  Kf  (14) 

where  r ml  is  the  ML  estimate  of  the  model  parameter  set  r, 
and  K '  is  the  number  of  free  adjustable  parameters  in  the  model 
[15],  [23].  AIC  criterion  will  select  the  correct  number  of  the 
image  regions  K0  when 


Kq  =  arg<  min 
1  1<K<Kma 


AIC(if)| . 


Rissanen  addressed  the  problem  from  a  quite  different  point 
of  view.  Rissanen  reformulated  the  problem  explicitly  as  an  in¬ 
formation  coding  problem  in  which  the  best  model  fitness  is 
measured  such  that  it  assigns  high  probabilities  to  the  observed 
data  while  at  the  same  time  the  model  itself  is  not  too  complex  to 
describe  [24].  The  model  is  selected  by  minimizing  the  total  de¬ 
scription  length  defined  by  minimum  description  length  (MDL) 

MDL(K)  =  -  \og{C(vML))  +  0.5  K'  log  (N^).  (16) 

Similarly,  the  correct  number  of  the  distinctive  image  regions 
K0  will  be  estimated  when 


=  argj 


min  MDL(K) 

1<K<Kuax  v  ' 


C.  Bayesian  Relaxation  Labeling 

Once  the  FGGM  model  is  given,  a  segmentation  problem  is 
the  assignment  of  labels  to  each  pixel  in  the  image.  A  straight¬ 
forward  way  is  to  label  pixels  into  different  regions  by  maxi¬ 
mizing  the  individual  likelihood  function  Pk(x).  This  approach 
is  called  ML  classifier,  which  is  equivalent  to  a  multiple  thresh¬ 
olding  method.  Usually,  this  method  may  not  achieve  a  good 
performance  since  there  is  lack  of  local  neighborhood  informa¬ 
tion  to  be  included  to  make  a  good  decision.  CBRL  algorithm 
[25]  is  one  of  the  approaches,  which  can  incorporate  the  local 
neighborhood  information  into  labeling  procedure  and  thus  im¬ 
prove  the  segmentation  performance.  In  this  study,  we  devel¬ 
oped  the  CBRL  algorithm  to  perform/refine  pixel  labeling  based 
on  the  localized  FGGM  model,  which  is  defined  as  follows. 

Let  di  be  the  neighborhood  of  pixel  i  with  anmxm  template 
centered  at  pixel  i .  An  indicator  function  is  used  to  represent  the 
local  neighborhood  constraints  %(/*,  l5)  =  I(lh  ^),  where  U 
and  lj  are  labels  of  pixels  i  and  j,  respectively.  Note  that  pairs 
of  labels  are  now  either  compatible  or  incompatible.  Similar  to 
reference  [25],  one  can  compute  the  frequency  of  neighbors  of 
pixel  i  which  has  the  same  label  values  k  as  at  pixel  i 


=  p(Zi  =  A:|W)  = 


m2  —  1 


£  nk,h) 


where  1^  denotes  the  labels  of  the  neighbors  of  pixel  i.  Since 
4**  is  a  conditional  probability  of  a  region,  the  localized  FGGM 
pdf  of  gray  level  Xi  at  pixel  i  is  given  by 

k 

p(*;|ldi)  =  £  Tr[l]pk(x2)  (19) 

k- 1 

where  is  given  in  (9).  Assuming  gray  values  of  the  image 
are  conditional  independent,  the  joint  pdf  of  x,  given  the  context 
labels  1,  is 

N\N2  k 

p(xii)  =  n  £  ^p^i)  (20) 

i=l  fc=l 

where  l  =  (U  :  i  =  1,  . . . ,  NiN2): 

It  is  known  that  CBRL  algorithm  can  obtain  a  consistent  la¬ 
beling  solution  based  on  the  localized  FGGM  model  (19).  Since 
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TABLE  I 

Distribution  of  the  Effective  Size  of  the  186  Masses  Used  in  This  Study,  The  Effective  Size  Is  Defined  as  the  Square  Root  of 
the  Product  of  the  Maximum  and  Minimum  Diameters  of  the  Mass 

|  0  —  5 mm  |  6  —  10mm  |  11 -15mm  |  16  —  20mm  |  21  —  25mm  |  26  —  30mm 
Tl  3  I  55  I  78  1  29  I  I V  I  4 


1  represents  the  labeled  image,  it  is  consistent  if  5i(Zi)  >  Si(Ar), 
for  all  k  =  1,  . . . ,  K  and  for  i  =  1,  , . . ,  N\N2  [25],  where 

Si(k)  =  (21) 

Now  we  can  define 

nxn2  /  \ 

^i)  =  E  ( E  *)$(*) )  (22) 

as  the  average  measure  of  local  consistency,  and 

LCi  =  £  I(lu  k)Si(k),  i  =  1,  . . . ,  NXN2  (23) 

k 

represents  the  local  consistency  based  on  1.  The  goal  is  to  find 
a  consistent  labeling  1  which  can  maximize  (22).  In  the  real 
application,  each  local  consistency  measure  LCi  can  be  max¬ 
imized  independently.  In  [25],  it  has  been  shown  that  when 
Rij(h ,  lj)  =  h),  if  A(l)  attains  a  local  maximum  at 

1,  then  1  is  a  consistent  labeling. 

Based  on  the  localized  FGGM  model,  l ^  can  be  initialized 
by  ML  classifier 

lj0)  =  argjmax pk(xi) j  ,  k  —  l,...,K.  (24) 

Then,  the  order  of  pixels  is  randomly  permutated  and  each  label 
li  is  updated  to  maximize  LG i.e.,  classify  pixel  i  into  fcth 
region  if 


U  =  argjmax  TVk  Pk(xi)j  >  fc  =  1,  — ,  jftT  (25) 

where  pk(xi)  is  given  in  (9),  7rj^  is  given  in  (18).  By  consid¬ 
ering  (24)  and  (25), we  developed  a  modified  CBRL  algorithm 
as  follows. 

CBRL  Algorithm: 

1)  Given  1^,  m  =  0 

2)  Update  pixel  labels 

•  Randomly  visit  each  pixel  for  i  =1, ... ,  N±N2 

•  Update  its  label  according  to 

ijm)  =  arg  jmax  ff£){m)pfc(a;j)  j . 


3)  When 


£(l(m+1>  ©  l<ro>) 


stop;  otherwise,  m  =  m- hi,  and  repeat  Step  2. 

IV.  Experimental  Results  and  Discussion 

In  this  section,  we  present  the  results  of  using  the  morpho¬ 
logical  filtering  and  model-based  segmentation  approach  we 
have  introduced  for  enhancement  and  segmentation  of  suspi¬ 


cious  masses  in  mammographic  images.  In  addition  to  the  qual¬ 
itative  assessment  by  the  radiologists,  we  introduce  several  ob¬ 
jective  measures  to  assess  the  performance  of  the  algorithms  we 
have  proposed  for  enhancement  and  segmentation. 

A  testing  data  set  of  200  mammograms  and  two  simulated 
tone  images  were  used  to  test  and  evaluate  the  performance  of 
the  algorithms  in  this  study.  The  mammograms  were  selected 
from  the  Mammographic  Image  Analysis  Society  (MIAS)  data¬ 
base  and  the  Brook  Army  Medical  Center  (BAMC)  database 
created  by  the  Department  of  Radiology  at  Georgetown  Uni¬ 
versity  Medical  Center.  Of  the  200  mammograms,  50  mammo¬ 
grams  are  normal,  and  each  of.  the  150  abnormal  mammograms 
contains  at  least  one  mass  case  of  varying  size,  subtlety,  and 
location.  The  areas  of  suspicious  masses  were  identified  by  an 
expert  radiologist  based  on  visual  criteria  and  biopsy  proven 
results.  The  total  data  set  includes  113  benign  and  73  malig¬ 
nant  masses.  The  distribution  of  the  masses  in  terms  of  size 
is  shown  in  Table  I.  The  BAMC  films  were  digitized  with  a 
laser  film  digitizer  (Lumiscan  150)  at  a  pixel  size  of  100  /xmx 
100  fim  and  4096  gray  levels  (12  bits).  Before  the  method  was 
applied  the  digital  mammograms  were  smoothed  by  averaging 
4x4  pixels  into  one  pixel.  According  to  radiologists,  the  size 
of  small  masses  is  3-15  mm  in  effective  diameter.  A  3-mm 
object  in  an  original  mammogram  occupies  30  pixels  in  a 
digitized  image  with  a  100-pm  resolution.  After  reducing  the 
image  size  by  four  times,  the  object  will  occupy  the  range  of 
about  seven  to  eight  pixels.  The  object  with  the  size  of  seven 
pixels  is  expected  to  be  detectable  by  any  computer  algorithm. 
Therefore,  the  shrinking  step  is  applicable  for  mass  cases  and 
can  save  computation  time. 

Experimental  Evaluation  of  Morphological  Enhance¬ 
ment:  In  order  to  justify  the  suitability  of  morphological 
structural  elements,  the  geometric  properties  of  the  contexts 
and  textures  in  mammograms  were  studied.  The  basic  idea 
is  to  keep  all  mass-like  objects  within  certain  size  range  and 
remove  all  others  by  using  the  proposed  morphological  filters 
with  specific  structural  elements.  At  the  resolution  of  400 
pm,  a  disk  with  a  diameter  of  seven  pixels  was  chosen  as  the 
morphological  structuring  elements  Bi  to  extract  textures  in 
mammograms.  Since  the  smallest  masses  have  seven  pixels  in 
diameter  with  the  resolution  of  400  ^m,  this  procedure  would 
not  destroy  mass  information.  For  the  purpose  of  background 
correction,  a  disk  with  a  diameter  of  75  pixels  was  used  as 
the  morphological  structuring  element  B2.  An  object  with  a 
diameter  of  75  pixels  corresponds  to  30  mm  in  the  original 
mammogram.  This  indicates  that  all  masses  with  sizes  up  to 
30  mm  can  be  enhanced  by  background  correction.  Masses 
larger  than  30  mm  are  rare  cases  in  the  clinical  setting.  In  the 
last  stage  of  our  approach,  we  applied  morphological  opening 
and  closing  filtering  using  a  disk  with  a  diameter  of  five  to 
eliminate  small  objects  which  also  contribute  to  texture  noise. 
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(a)  (b) 

Fig.  3.  (a)  Original  simulated  test  image  for  model  selection  (fc0  =  4,  SNR  =  10  dB)  and  (b)  the  AIC/MDL  curves  in  model  selection  (cr  =  30). 


All  testing  mammograms  were  processed  using  the  proposed 
enhancement  approach  with  the  suggested  structuring  element 
B i  and  #2-  Fig-  5  shows  processed  mammogram  examples 
using  the  morphological  enhancement.  Compared  the  enhanced 
results  [Fig.  5(b)  and  (d)]  with  the  original  mammograms 
[Fig.  5(a)  and  (c)],  the  proposed  method  not  only  enhanced  all 
suspected  mass  patterns  and  reduced  the  texture  noise,  but  also 
removed  the  background  noise.  In  summary,  the  proposed  mor¬ 
phological  enhancement  approach  can  enhance  mass  patterns 
and  remove  texture  structure  noises.  For  dense  mammograms, 
such  as  the  second  example  in  Fig.  5(c)  and  (d),  the  mass 
is  obscured  by  dense  fibroglandular  tissues,  our  experience 
shows  applying  the  dual  morphological  operation  to  remove 
the  fibroglandular  tissue  background  is  useful.  In  addition 
to  the  visual  evaluation  by  the  radiologist,  we  performed  the 
segmentation  to  assess  the  effectiveness  of  the  morphological 
filtering,  based  on  the  enhanced  mammograms  and  the  original 
mammograms. 

Simulated  Evaluation  of  Segmentation  Algorithms:  The 
performance  of  model  selection  using  two  frequently  used 
methods,  i.e.,  the  AIC  and  MDL  [22],  were  first  tested  and 
compared  in  the  simulation  study.  The  computer-generated  data 
was  made  up  of  four  overlapping  normal  components.  Each 
component  represents  one  local  region.  The  value  for  each 
component  were  set  to  a  constant  value,  the  noise  of  normal 
distribution  was  then  added  to  this  simulation  digital  phantom. 
Three  noise  levels  with  different  variance  were  set  to  keep  the 
same  signal-to-noise  ratio  (SNR),  where  SNR  is  defined  by 

SNR  =  101og10  (26) 

<J* 

where  A fi  is  the  mean  difference  between  regions,  and  a2  is 
the  noise  power.  The  original  data  for  the  simulation  study  are 


(c)  (d) 


Fig.  4.  Image  segmentation  by  CBRL  on  simulated  image  (with  initialization 
by  ML  classification),  (a)  ML  initialization,  (b)  First  iteration  in  CBRL.  (c) 
Second  iteration  in  CBRL.  (d)  Third  iteration  in  CBRL. 


TABLE  II 

Comparison  of  CBRL,  ICM,  and  MICM  Algorithm:  Simulated  Data 


Item 

|  CBRL  Result 

ICM  Result 

MICM  Result 

Classification  Error 

|  0.7935% 

i  0.7508%  : 

0.3113% 

given  in  Fig.  3(a).  The  AIC  and  MDL  curves,  as  functions  of  the 
number  of  local  clusters  AT,  are  plotted  in  Fig.  3(b).  According 
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(a)  (b)  (c)  (d) 


Fig.  5.  Examples  of  mass  enhancement,  (a)  Original  mammogram,  (b)  Enhanced  mammogram,  (c)  and  (d)  Another  original  mammogram  and  its  enhanced  result. 


to  the  information  theoretic  criteria,  the  minima  of  these  curves 
indicate  the  correct  number  of  the  local  regions.  From  this  ex¬ 
perimental  figure,  it  is  clear  that  the  number  of  local  regions 
•  suggested  by  these  criteria  are  all  correct. 

For  the  validation  of  image  segmentation  using  CBRL,  we 
apply  the  algorithm  first  to  a  simulated  image.  We  use  ML  clas¬ 
sifier  to  initialize  image  segmentation,  i.e.,  to  initialize  the  quan¬ 
tified  image  by  selecting  the  pixel  label  with  largest  likelihood 
at  each  node.  The  classification  error  after  initialization  is  uni¬ 
formly  distributed  over  the  spatial  domain  as  shown  in  Fig.  4(a). 
Our  experience  suggested  this  to  be  a  very  suitable  starting  point 
for  contextual  relaxation  labeling  [21].  The  CBRL  is  then  per¬ 
formed  to  fine  tune  the  image  segmentation.  It  should  be  em¬ 
phasized  that  the  ground  truth  is  known  in  this  simulated  ex¬ 
periment,  the  percentage  of  total  classification  error  is  used  as 
the  criterion  for  evaluating  the  performance  of  segmentation 
technique.  In  Fig.  4(a)-(d),  the  initial  segmentation  by  the  ML 
classification  and  the  stepwise  results  of  three  iterations  in  the 
CBRL  are  presented.  In  this  experiment,  algorithm  initializa¬ 
tion  results  in  an  average  classification  error  of  30%.  It  can 
be  clearly  seen  that  a  dramatic  improvement  is  obtained  after 
several  iterations  of  the  CBRL  by  using  local  constraints  de¬ 
termined  by  the  context  information.  In  addition,  the  conver¬ 
gence  is  fast  as  one  can  see,  after  the  first  iteration  most  of 
the  misclassification  are  removed.  We  have  also  implemented 
two  other  independent  and  popular  algorithms,  namely,  the  it¬ 
erated  conditional  mode  (ICM)  and  the  modified  iterated  con¬ 
ditional  mode  (MICM)  algorithms,  so  as  to  assess  the  compar¬ 
ative  performance  of  the  segmentation  results  among  different 
approaches  [21],  [22] .  The  only  assumption  being  made  by  these 
.  three  methods  is  the  Markovian  property  of  the  context  images 
which  can  be  well  justified  by  the  underlying  cell  oncology 
and  pathology.  We  have  applied  these  three  algorithms  to  the 
same  testing  image  and  the  corresponding  classification  errors 
are  presented  in  Table  II.  The  final  percentage  of  classification 
errors  for  Fig.  4(d)  is  0.7935%.  From  this  experimental  compar¬ 
ison,  it  can  be  concluded  that  three  algorithms  achieved  com- 


table  m 

Computed  AICs  for  the  FGGM  Model  with  Different  a 


K 

a  =  1.0 

a  =  2.0 

a  =  3.0 

a  =  4.0 

2 

651250 

650570 

650600 

650630 

3 

646220 

644770 

645280 

646200 

4 

645760 

644720 

645260 

646060 

5 

645760 

644700 

645120 

646040 

6 

645740 

644670 

645110 

645990 

7 

645640 

644600 

645090 

645900 

8 

645550(min) 

644570  (min) 

645030(min) 

645850(min) 

9 

645580 

644590 

645080 

645880 

10 

645620 

644600 

645100 

645910 

TABLE  IV 

Computed  MDLs  for  the  FGGM  Model  with  Different  a 


I< 

a  =  1.0 

a  =  2.0 

a  —  3.0 

a  =  4.0 

2 

651270 

650590 

650630 

650660 

3 

646260 

644810 

645360 

646350 

4 

645860 

644770 

645280 

646150 

5 

645850 

644770 

645280 

646100 

6 

645790 

644750 

645150 

646090 

7 

645720 

644700 

645120 

645930 

8 

645680(min) 

644690(min) 

645100(min) 

645900(min) 

9 

645710  ' 

644710 

645140 

645930 

10 

645790 

644750 

645180 

645960 

parable  segmentation  accuracy  and  the  result  produced  by  the 
MICM  algorithm  is  most  superior,  though  in  terms  of  computa¬ 
tional  complexity  the  CBRL  algorithm  is  the  least.  It  should  be 
noticed  that  since  in  MICM  algorithm  an  inhomogeneous  con¬ 
figuration  of  the  Markov  random  field  is  used,  its  superior  per¬ 
formance  is  reasonable. 

On  Model-Based  Segmentation — Real  Case  Study :  In  the 
real  case  study,  we  used  two  information  criteria  (AIC  and 
MDL)  to  determine  K.  Tables  III  and  IV  shows  the  AIC  and 
MDL  values  with  different  K  and  a  of  the  FGGM  model  based 
on  one  original  mammogram.  As  it  can  be  seen  from  Tables  III 
and  IV,  although  with  different  a,  all  AIC  and  MDL  values 
achieve  the  minimum  when  K  =  8.  It  indicates  that  AIC  and 
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MDL  are  relatively  insensitive  to  the  change  of  a.  With  this 
observation,  we  can  decouple  the  relation  between  K  and  a 
and  choose  the  appropriate  value  of  one  while  fixing  the  value 
of  another.  Fig.  6(a)  and  (b)  are  two  examples  of  AIC  and  MDL 
curves  with  different  K  and  fixed  a  =  3.0.  Fig.  6(a)  is  based  on 
the  original  mammogram  and  Fig.  6(b)  is  based  on  the  enhanced 
mammogram.  As  we  can  see  in  Fig.  6(a),  both  criteria  achieved 
the  minimum  when  K  —  8.  It  should  be  noticed  that  though  no 
ground  truth  is  available  in  this  case,  our  extensive  numerical 
experiments  have  shown  a  very  consistent  performance  of 
the  model  selection  procedure  and  all  the  conclusions  were 
strongly  supported  by  the  previous  independent  work  reported 
by  [14].  Fig.  6(b)  indicates  that  K  =  4is  the  appropriate 
choice  for  the  mammogram  enhanced  by  dual  morphological 
operation.  This  is  believed  to  be  reasonable  since  the  number 
of  regions  decrease  after  background  correction. 

We  fixed  K  =  8,  and  changed  the  value  of  a  for  estimating 
the  FGGM  model  parameters  using  the  proposed  EM  algorithm 
with  the  original  mammogram  The  GRE  value  between  the  his¬ 
togram  and  the  estimated  FGGM  distribution  was  used  as  a  mea¬ 
sure  of  the  estimation  bias.  We  found  that  GRE  achieved  a  min¬ 
imum  distance  when  the  FGGM  parameter  a  =  3,0  as  shown  in 
Fig.  7.  The  similar  result  was  shown  when  we  applied  the  EM 
algorithm  to  the  enhanced  mammogram  with  K  =  4.  This  in¬ 
dicated  that  the  FGGM  model  might  be  better  than  the  FNM 
model  (a  =  2.0)  in  modeling  mammographic  images  when 
the  true  statistical  properties  of  mammograms  are  generally  un¬ 
known,  though  the  FNM  has  been  most  often  chosen  in  many 
previous  work  [15]. 

After  the  determination  of  all  model  parameters,  every  pixel 
of  the  image  was  labeled  to  a  different  region  (from  1  to  K ) 
based  on  the  CBRL  algorithm.  We  then  selected  the  brightest  re- 


TABLE  V 

Comparison  of  Segmantation  Error  Resulting  from  Noncontextual 
and  Contextual  Methods 


Method 

|  Soft  Classification 

Bayesian  Classification 

CBRL 

GRE  Value 

|  0.0067 

0.4406 

0.1578 

gion,  which  corresponding  to  label  Ky  plus  a  criterion  of  closed 
isolated  area,  as  the  candidate  region  of  suspicious  masses.  Ac¬ 
cording  to  the  visual  inspections  by  the  radiologists,  when  we 
use  K  -  1  instead  of  K ,  the  results  are  over-segmented.  For  the 
case  of  using  K  +  1,  the  results  are  under-segmented.  In  order 
to  quantify  the  performance  differences  between  the  different 
segmentation  methods,  several  groups  have  suggested  that  the 
segmentation  results  may  be  compared  against  radiologists’  out¬ 
lines  of  the  lesions  [3].  Though  the  proposed  comparison  mea¬ 
sures  are  quantitative,  the  performance  measures  are  still  quali¬ 
tative,  since  the  reference  base  (e.g.,  gold  standard  by  the  radi¬ 
ologists)  is  qualitative,  subjective,  and  imperfect.  Therefore,  in 
this  model-supported  approach,  in  addition  to  the  visual  inspec¬ 
tions  by  the  radiologists,  we  have  also  introduced  an  objective 
measure,  the  GRE  between  the  histogram  of  the  pixel  images 
Px(^)  and  the  FGGM  of  the  segmented  image  p*}i(u)  to  assess 
the  performance  of  the  segmentation,  defined  by 

GRE(px(u)||px,  i(«))  =  Yl,  1o§  — (2?) 

where  1  is  the  context  image  estimated  by  the  segmentation  al¬ 
gorithm.  Considering  that  the  ergodic  theorem  is  the  most  fun¬ 
damental  principle  in  the  detection  and  estimation  theory,  it  is 
believed  that  when  a  good  segmentation  is  achieved,  the  dis¬ 
tance  between  the  p*(u)  and  px>  i(u)  should  be  minimized  and 
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(C)  (d) 

Fig.  7.  Comparison  of  learning  curves  and  histogram  of  the  original  mammogram  with  different  a,  A:  =  8.  The  optimal  a  =  3.0.  (a)  «  =  1.0,  GRE  —  0.0783. 
(b)  (a)  a  =  2.0,  GRE  =  0.0369.  (c)  ft  -  3.0,  GRE  =  0.0251.  (d)  ft  =  4.0,  GRE  =  0.0282. 


Fig.  8.  Suspected  mass  segmentation  results  based  on  the  original  mammogram,  (b)  Result  based  on  the  enhanced  mammogram,  l\  =  4.  a  —  3.0.  (c)  and  (d) 
Results  based  on  another  original  mammogram  and  its  enhanced  image. 
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(a) 


Fig  9.  Examples  of  normal  mixed  fatty  and  glandular  mammogram,  (a)  Original  mammogram,  (b)  Segmentation  result  based  on  the  original  mammogram,  (c) 
Enhanced  mammogram,  (d)  Result  based  on  the  enhanced  mammogram,  k  —  4,  cv  ~  3.0. 


Fig.  10.  Examples  of  normal  dense  mammogram,  (a)  Original  mammogram,  (b)  Segmentation  result  based  on  the  original  mammogram,  (c)  Enhanced 
mammogram,  (d)  Result  based  on  the  enhanced  mammogram,  k  =  4,  a  =  3.0. 


this  measure  links  the  image  text  and  its  sample  averages.  Our 
experience  has  suggested  that  this  post-segmentation  measure 
may  be  a  suitable  objective  criterion  for  evaluating  the  quality 
of  image  segmentation  in  a  fully  unsupervised  situation  [22], 
[26]— [28].  Table  V  shows  our  evaluation  data  from  three  dif¬ 
ferent  segmentation  methods  when  applied  to  the  real  images. 

Performance  of  Combined  Morphological  Filtering  and 
Model-Based  Segmentation  using  a  Larger  Database:  The 
proposed  segmentation  method  was  used  to  extract  suspicious 
mass  regions  from  the  200  testing  mammograms.  Without  en¬ 
hancement,  a  total  of  1 142  potential  mass  regions  were  isolated 
including  1 14  of  the  1 86  true  masses.  With  enhancement,  a  total 
of  3143  potential  mass  regions  were  extracted  including  181  of 
the  186  true  masses.  The  results  demonstrated  that  more  true 
masses  were  picked  up  after  enhancement  although  more  false 
cases  were  also  included.  The  undetected  areas  mainly  occurred 
at  the  lower  intensity  side  of  the  shaded  objects  or  obscured  by 
fibroglandular  tissues  that,  however,  were  extracted  on  morpho¬ 


logical  enhanced  mammograms.  In  addition,  when  the  margins 
of  masses  are  ill  defined,  only  parts  of  suspicious  masses  were 
extracted  from  the  original  mammograms.  For  the  purpose  of 
“lesion  site  selection,”  we  believe  that  the  sensitivity  should  be 
the  sole  criterion  for  the  performance  evaluation  of  the  method. 
We  have  181/186  versus  114/186.  Our  method  is  unsupervised 
and  automatic  and  does  not  involve  any  detection  effort  at  this 
moment.  To  our  best  knowledge,  there  is  no  objective  criterion 
available  for  the  evaluation  of  image  enhancement  performance 
before  a  detection  effort  is  involved.  We  only  claimed  that  the 
enhancement  step  is  important  and  effective  with  respect  to  the 
purpose  of  “lesion  site  selection.” 

Fig.  8  demonstrates  some  segmentation  results  based  on  the 
original  and  enhanced  mammograms.  We  compared  the  seg¬ 
mentation  results  based  on  the  enhanced  mammogram  ( K  = 
4,  and  a  =  3.0)  with  those  based  on  the  original  mammogram 
(K  =  8,  and  a  =  3.0)  as  shown  in  Fig.  8.  Comparing  the  re¬ 
sults  in  Fig.  8(b)  with  those  in  Fig.  8(a),  we  can  see  that  after 
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Fig.  1 1 .  Comparison  results  of  segmantation  based  on  the  enhanced  mammograms.  Black  outlines  denote  the  computer-segmented  result.  White  outlines  denote 
ther  radiologist-segmanted  results. 


enhancement,  a  more  accurate  region  was  detected  for  the  sus¬ 
pected  mass  which  has  ill-defined  margin.  Getting  an  accurate 
suspected  region  is  a  crucial  issue  since  geometric  features  are 
extracted  based  on  suspected  regions  and  these  features  are  very 
important  for  further  true  mass  detection.  In  addition,  we  ob¬ 
served  that  one  suspected  mass  was  missed  in  Fig.  8(a)  but  was 
detected  in  Fig.  8(b).  As  we  have  mentioned  in  Section  I,  none  of 
the  suspected  masses  should  be  missed  in  the  segmentation  step. 
Fig.  8(c)  and  (d)  demonstrate  the  segmentation  of  a  suspected 


mass  that  lies  in  dense  breast  tissue.  As  shown  in  Fig.  8(c),  the 
whole  fibroglandular  tissue  area  was  segmented  when  based  on 
the  original  mammogram.  After  enhancement,  the  suspected  re¬ 
gion  was  segmented  exactly  as  shown  in  Fig.  8(d). 

We  have  also  included  the  segmentation  results  on  the  normal 
mammograms.  Fig.  9  demonstrate  the  segmentation  results 
based  on  the  original  and  enhanced  mixed  fatty  and  glandular 
mammograms.  Fig.  10  demonstrate  the  segmentation  results 
based  on  the  original  and  enhanced  dense  mammograms.  We 
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would  like  to  emphasize  that  the  objective  of  this  paper  is 
to  provide  a  segmentation  technique  which  can  enhance  and 
extract  potential  mass  site  from  the  background  so  that  the 
characterization  of  the  related  mass  pattern  can  be  accurately 
extracted  in  terms  of  focused  feature  selection  and  analysis. 
The  method  of  course  will  produce  many  mass-like  areas,  but 
it  will  be  a  plausible  outcome  since  the  accurate  description  of 
nonmass  cases  characterized  by  mass-like  sites  will  benefit  the 
follow-on  detection  step  where  the  performance  of  the  classifier 
depends  on  an  accurate  separation  of  mass  and  nonmass  in  the 
featured  spaces.  The  details  will  be  described  in  [29]. 

For  the  purpose  of  evaluating  the  performance  of  the  segmen¬ 
tation  method,  we  used  both  simulated  studies  and  expert  visual 
inspection  to  validate  the  methods  and  results.  The  radiologist 
has  concluded  that  the  lesion  characteristics  after  the  proposed 
enhancement  have  been  better  displayed  and  all  possible  lesion 
areas  have  been  successfully  identified.  In  addition  to  the  vi¬ 
sual  inspection,  we  have  measured  the  overlap  between  the  com¬ 
puter-segmented  and  the  radiologist  segmented  mass  regions  to 
evaluate  our  method.  Fig.  11  shows  the  comparison  results  of 
segmentation  based  on  the  enhanced  mammograms.  Fig.  1 1  in¬ 
cludes  60  benign  and  malignant  mass  patches  which  were  cut 
from  the  whole  mammograms  after  the  segmentation.  The  white 
outline  was  drawn  by  the  radiologist  while  the  black  outline  was 
produced  by  the  computer  and  was  superimposed  upon  the  orig¬ 
inal  image.  As  we  can  see  from  Fig.  1 1,  for  most  of  cases,  the 
ratio  of  mutual  overlap  area  of  the  radiologist  segmented  mass 
region  and  the  computer-segmented  mass  region  to  the  radiol¬ 
ogist  segmented  mass  area  is  large  than  50%.  In  addition,  even 
the  poorest  result  picked  the  true  lesion  in  the  correct  location 
and  depicted  the  characteristics  of  the  mass  reasonably.  It  is  im¬ 
portant  to  understand  that  “lesion  area  segmentation”  is  not  our 
objective,  so  there  is  no  “best”  or  “worst”  segmentation  results. 
Our  objective  is  “lesion  site  selection”  with  a  possible  highest 
sensitivity  through  a  global  unsupervised  enhancement  and  seg¬ 
mentation  scheme. 

V.  Conclusion 

In  this  paper,  we  propose  a  combined  method  of  using  mor¬ 
phological  operations,  a  FGGM  modeling,  and  a  CBRL  to  en¬ 
hance  and  segment  various  breast  tissue  textures  and  suspicious 
mass  lesions  from  mammographic  images.  This  phase  is  a  cru¬ 
cial  step  in  mass  detection  for  an  improved  CAD.  We  empha¬ 
sized  the  importance  of  model  selection  which  includes  the  se¬ 
lection  of  the  number  of  image  regions  K  and  the  selection  of 
FGGM  kernel  shape  controlled  by  a.  The  experimental  results 
indicate  that  the  suspected  masse  sites  selection  can  be  affected 
by  different  K  and  a.  We  proposed  the  EM  algorithm  together 
with  the  information  theoretic  criteria  to  determine  the  optimal 
K  and  a.  With  optimal  K  and  a,  the  segmentation  results  can  be 
significantly  improved.  We  also  showed  that  with  the  proposed 
pattern-dependent  enhancement  algorithm  using  morphological 
operations,  the  subtle  masses  can  be  segmented  more  accurately 
than  those  when  the  original  image  is  used  for  extraction  without 
enhancement.  To  summarize,  the  morphological  filtering  en¬ 
hancement  combined  with  the  stochastic  model-based  segmen¬ 
tation  is  an  effective  way  to  extract  mammographic  suspicious 
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patterns  of  interest,  and  thereby  may  facilitate  the  overall  per¬ 
formance  of  mammographic  CAD  of  breast  cancer. 
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Abstract — Based  on  the  enhanced  segmentation  of  suspicious 
mass  areas,  further  development  of  computer-assisted  mass  detec¬ 
tion  may  be  decomposed  into  three  distinctive  machine  learning 
tasks:  1)  construction  of  the  featured  knowledge  database,*  2)  map¬ 
ping  of  the  classified  anchor  unclassified  data  points  in  the  data¬ 
base;  and  3)  development  of  an  intelligent  user  interface.  A  decision 
support  system  may  then  be  constructed  as  a  complementary  ma¬ 
chine  observer  that  should  enhance  the  radiologists  performance  in 
mass  detection.  We  adopt  a  mathematical  feature  extraction  pro¬ 
cedure  to  construct  the  featured  knowledge  database  from  all  the 
suspicious  mass  sites  localized  by  the  enhanced  segmentation.  The 
optimal  mapping  of  the  data  points  is  then  obtained  by  learning  the 
generalized  normal  mixtures  and  decision  boundaries,  where  a  is 
developed  to  carry  out  both  soft  and  hard  clustering.  A  visual  ex¬ 
planation  of  the  decision  making  is  further  invented  as  a  decision 
support,  based  on  an  interactive  visualization  hierarchy  through 
the  probabilistic  principal  component  projections  of  the  knowledge 
database  and  the  localized  optimal  displays  of  the  retrieved  raw 
data.  A  prototype  system  is  developed  and  pilot  tested  to  demon¬ 
strate  the  applicability  of  this  framework  to  mammographic  mass 
detection. 

Index  Terms — Feature  extraction,  knowledge  database,  mass  de¬ 
tection,  neural  network,  visual  explanation. 


I.  Introduction 

IN  ORDER  to  improve  mass  lesion  detection  and  classifi¬ 
cation  in  clinical  screening  and/or  diagnosis  of  breast  can¬ 
cers,  many  sophisticated  computer-assisted  diagnosis  (CAD) 
systems  have  been  recently  developed  [1  ]— [10].  Although  the 
clinical  roles  of  the  CAD  systems  may  still  be  debatable,  the 
fundamental  role  should  be  complementary  to  the  radiologists’ 
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Fig.  1.  Major  components  in  CAD. 


clinical  duties,  where  the  pathways  of  achieving  ultimate  perfor¬ 
mance  enhancement  taken  by  the  machine  observer  and  human 
observer  may  not  necessarily  be  close.  For  example,  CAD  sys¬ 
tems  may  attack  the  tasks  that  the  radiologists  cannot  perform 
well  or  find  difficult  to  perform.  Because  of  generally  larger  size 
and  complex  appearance  of  masses,  especially  the  existence  of 
spicules  in  malignant  lesions,  as  compared  with  microcalcifi¬ 
cations,  feature-based  approaches  are  largely  adopted  in  many 
CAD  systems  [l]-[4],  [6],  [7].  Kegelmeyer  has  first  reported 
promising  results  for  detecting  spiculated  tumors  based  on  local 
edge  characteristics  and  Laws  texture  features  [7].  Zwiggelaar 
et  al  developed  a  statistical  model  to  describe  and  detect  the 
abnormal  pattern  of  linear  structures  of  spiculated  lesions  [1]. 
Karssemeijer  et  al.  [2]  proposed  to  identify  stellate  distortions 
by  using  the  orientation  map  of  line-like  structures.  Petrick  et 
al.  presented  to  reduce  the  false  positive  detection  by  combining 
the  breast  tissue  composition  information  [4].  Zhang  et  al.  used 
the  Hough  spectrum  to  detect  spiculated  lesions  [6], 

Although  many  previously  proposed  approaches  have  led 
to  impressive  results  [1]— [5],  [7],  several  fundamental  issues 
remain  unresolved  in  the  application  of  CAD  systems.  Fig.  1 
shows  a  general  block  diagram  of  CAD  systems.  Previous 
research  has  demonstrated  that:  1)  breast  cancer  is  missed  on 
mammograms  in  part  because  the  optical  density  and  contrast 
of  the  cancer  is  not  optimal  for  human  observer;  2)  com¬ 
puter-based  detection  appears  to  be  more  affected  by  different 
criteria  than  human  perception;  3)  the  challenges  and  pathways 
to  the  human  or  machine  observers  may  be  quite  different,  and 
4)  decision  making  by  the  CAD  systems  are  largely  not  trans¬ 
parent  to  the  user.  For  example,  the  training  cases  contributing 
to  the  database  are  often  selected  by  the  human  observer 
while  the  featured  knowledge  database  is  constructed  through 
mathematical  pathways  of  feature  extraction.  The  mismatch 
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between  the  human  supervised  case  selection  in  training  and 
the  machine  dominant  mass  candidates  selection  in  testing 
may  exist.  Second,  the  featured  knowledge  database  is  often 
high-dimensional  with  complex  internal  structures.  Imposing 
a  heuristically  designed  neural  network  for  learning  from  the 
training  data  set  may  prevent  a  correct  identification  of  the 
intrinsic  data  structure  and  an  accurate  estimation  of  the  class 
boundaries.  There  may  also  exist  the  mismatch  between  the 
data  structure  and  classifier  architecture  or  between  the  class 
boundaries  and  decision  boundaries.  Furthermore,  since  the 
machine  observer  and  human  observer  may  not  detect  the  same 
set  of  masses,  the  “black  box”  nature  of  most  CAD  systems  to 
the  clinical  users  will  prevent  a  natural  on-line  integration  of 
human  intelligence  and  further  upgrade  of  a  CAD  system.  An 
interactive  user  interface  should  be  considered  to  leverage  the 
complementary  roles  of  the  CAD  in  the  clinical  practice. 

As  a  step  toward  improving  the  performance  of  a  CAD 
system,  we  have  put  considerable  efforts  to  conduct  various 
studies  and  develop  reliable  image  enhancement  and  lesion  se¬ 
lection  techniques.  The  methods  and  results  have  been  reported 
in  [24],  where  the  purposes  of  the  research  were  to  localize  the 
potential  mass  sites  and  help  accurate  feature  extraction.  This 
paper  addresses  the  further  development  of  computer-assisted 
mass  detection  based  on  the  1)  construction  of  the  featured 
knowledge  database;  2)  mapping  of  the  classified  and/or  un¬ 
classified  data  points  in  the  database;  and  3)  development  of  an 
intelligent  user  interface  (IUI).  The  clinical  goal  is  to  eliminate 
the  false  positive  sites  that  correspond  to  normal  dense  tissues 
with  mass-like  appearances  through  featured  discrimination. 
We  adopt  a  mathematical  feature  extraction  procedure  to  con¬ 
struct  the  featured  knowledge  database  from  all  the  suspicious 
mass  sites  localized  by  the  enhanced  segmentation.  The  optimal 
mapping  of  the  data  points  is  then  obtained  by  .  learning  the 
generalized  normal  mixtures  and  decision  boundaries,  where  a 
probabilistic  modular  neural  network  (PMNN)  is  developed  to 
carry  out  both  soft  and  hard  clustering.  A  visual  explanation  of 
the  decision  making  is  further  invented  as  a  decision  support 
tool,  based  on  an  interactive  visualization  hierarchy  through  the 
probabilistic  principal  component  projections  of  the  knowledge 
database  and  the  localized  optimal  displays  of  the  retrieved  raw 
data.  The  motivation  of  this  work  comes  from  the  following 
considerations.  First,,  though  both  human  and  machine  ob¬ 
servers  use  the  same  set  of  raw  data  in  the  diagnostic  stage,  the 
construction  of  the  knowledge  database  for  training  machine 
classifiers  and  that  accomplished  by  human  brains  are  indeed 
different.  Thus,  the  knowledge  database  should  be  established 
with  both  machine  and  expert  organized  representative  cases. 
Second,  a  quantitative  understanding  of  the  knowledge  database 
used  by  the  machine  observer  should  be  acquired  to  logically 
compare  and/or  predict  the  performance  of  CAD  systems  with 
respect  to  the  human  observers  without  possible  under-  or 
over-estimation,  and  to  optimize  the  feature  extraction  and 
design  of  the  machine  learner  for  best  final  performance. 
Finally,  since  the  human  and  machine  observers  indeed  take 
different  learning  and  intelligence  pathways,  an  IUI  should  be 
developed  to  visually  (e.g.,  transparently)  explain  the  entire 
internal  decision  making  process  of  the  CAD  system  to  the 
human  observer  to  enhance  the  clinical  decision  when  facing 
either  consistent  or  conflicting  opinions. 


The  major  differences  between  our  work  and  the  previous 
work  [1]-[10]  are  as  follows. 

1)  We  construct  a  knowledge  database  by  combining  both 
expert  and  machine  selected  cases  where  the  assignment 
of  class  memberships  (e.g.,  mass  and  nonmass  classes)  is 
supervised  by  the  radiologists  or  pathological  report  after 
all  the  cases  are  collected. 

2)  We  impose  a  model  identification  procedure  to  determine 
the  optimal  number  and  kernel  shape  of  the  local  clus¬ 
ters  within  each  of  the  two  classes  in  a  high-dimensional 
feature  space.  The  model  is  then  estimated  using  the  ex¬ 
pectation-maximization  (EM)  algorithm  and  information 
theory, 

3)  We  develop  a  PMNN,  which  is  considered  as  a  nonlinear 
classifier,  to  carry  out  the  mapping  function  of  the  knowl¬ 
edge  database.  In  the  knowledge  database,  the  decision 
likelihood  boundaries  and  the  class  prior  probabilities  are 
determined  in  a  separate  fashion,  and  the  structure  of 
PMNN  is  optimized  by  adapting  to  the  database  structure. 

4)  We  derive  a  probabilistic  principal  component  projection 
scheme  to  reduce  the  dimensionality  of  the  feature  space 
for  natural  human  perception.  The  scheme  leads  to  a  hi¬ 
erarchical  visualization  algorithm  allowing  the  complete 
data  set  to  be  analyzed  at  the  top  level,  with  best  separated 
clusters  and  subclusters  of  data  points  analyzed  at  deeper 
levels. 

The  framework  of  the  proposed  method  for  mass  detection  is 
illustrated  in  Fig.  2.  A  detailed  description  of  this  paper  is  orga¬ 
nized  as  follows.  In  Section  II,  the  procedure  of  the  knowledge 
database  construction  is  described.  The  data  mapping  process 
for  decision  making  is  presented  in  Section  m.  Section  IV 
presents  the  design  of  the  IUI  for  the  CAD  systems.  Finally, 
major  results  and  discussions  are  summarized  in  Section  V. 

n.  Knowledge  Database  Construction 

Given  the  available  information  contained  in  the  raw  data  of 
mass  sites  and  in  order  to  establish  machine  intelligence  carried 
out  by  various  machine  observers,  a  knowledge  database  may 
be  constructed  in  a  multidimensional  feature  space.  It  should  be 
emphasized  however  that  the  knowledge  acquired  by  the  human 
brain  uses  much  more  sophisticated  processes  than  the  artificial 
systems.  Though  feature  extraction  has  been  a  key  step  in  most 
pattern  analysis  tasks,  the  mathematical  procedures  are  often 
done  intuitively  and  heuristically.  The  general  guidelines  are: 

1)  Discrimination:  Features  of  patterns  in  different  classes 
should  have  significantly  different  values. 

2)  Reliability:  Features  should  have  similar  values  for  the 
patterns  of  the  same  class. 

3)  Independence:  Features  should  not  be  strongly  correlated 
to  each  other. 

4)  Optimality:  Some  redundant  features  should  be  deleted. 
A  small  number  of  features  is  preferred  for  reducing  the 
complexity  of  the  classifier. 

Many  useful  image  features  have  been  suggested  previously 
by  both  image  processing  and  pattern  analysis  communities 
[1 1]— [13].  These  features  can  be  divided  into  three  categories, 
namely,  intensity  features,  geometric  features,  and  texture 
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Fig.  2.  The  flow  diagram  of  mass  detection  in  digital  mammograms. 


features,  whose  values  are  calculated  from  the  pixel  matrices 
of  the  regions  of  interest  (ROIs).  Though  these  features  are 
mathematically  well  defined,  they  may  not  be  complete  since 
they  cannot  capture  all  of  the  capable  aspects  of  human  per¬ 
ception  nature.  Thus,  in  this  study,  we  have  included  several 
additional  expert-suggested  features  to  reflect  the  radiologists’ 
experience.  The  typical  features  are  summarized  in  Table  I, 
where  Fig.  3  shows  the  raw  image  of  corresponding  featured 
sites. 

The  joint  histogram  of  the  feature  point  distribution  extracted 
from  true  and  false  mass  regions  are  investigated,  and  the  fea¬ 
tures  that  can  better  separate  the  true  and  false  mass  regions 
are  selected  for  further  study.  Our  experience  has  suggested  that 
three  features,  i.e.,  the  site  area,  two  measured  compactness  (cir¬ 
cularity),  and  difference  entropy,  were  having  better  discrimi¬ 
nation  and  reliability  properties.  Their  definitions  are  given  as 
follows. 

1)  Compactness  1 


(1) 


where  A  is  the  area  of  the  actual  suspected  region,  and 
A\  is  the  area  of  the  overlapped  region  of  A  and  the  ef¬ 
fective  circle  Ac,  which  is  defined  as  the  circle  whose  area 
is  equal  to  A  and  is  centered  about  the  corresponding  cen¬ 
troid  of  A. 

2)  Compactness  2 


C2  = 


P 

47tA 


(2) 


where  P  is  the  boundary  perimeter,  and  A  is  the  area  of 
region. 


TABLE  I 

The  Summary  of  Mathematical  Features 


Feature  Sub-Space  |  Features 


A.  Intensity  Features 

1 .  contrast  measure  of  ROIs; 

2.  standard  derivation  inside  ROIs; 

3.  mean  gradient  of  ROIs  boundary 

B.  Geometric  Features 

1.  area  measure; 

2.  circularity  measure; 

3.  deviation  of  the  normalized  radial  length; 

4.  boundary  roughness; 

C.  Texture  Features 

1.  energy  measure; 

2.  correlation  of  co-occurrence  matrix; 

3.  inertia  of  co-occurrence  matrix; 

4.  entropy  of  co-occurrence  matrix; 

5.  inverse  difference  moment; 

6.  sum  average; 

7.  sum  entropy; 

S.  difference  entropy; 

9.  fractal  dimension  of  surface  of  ROI; 

3)  Difference  Entropy 


L-l 

DHd'9  =  -  ^2  Px-y{k)\ogpx-y(k)  (3) 

fe=0 

where 

L-l  L— 1 

px-y(k)  =  i).  I*  -  i\  =  k.  (4) 

i=0  j= 0 


Several  important  observations  are  worth  reiteration: 

1)  The  knowledge  database  that  will  be  used  by  the  CAD 
system  are  constructed  from  the  cases  selected  by  both 
lesion  localization  procedure  and  human  expert’s  experi¬ 
ence.  This  joint  set  provides  more  complete  knowledge  to 
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(c) 

Fig.  3.  One  example  of  mass  segmentation  and  boundary  extraction,  (a)  Mass 
patch;  (b)  segmentation;  (c)  boundary  extraction. 


the  machine  observer.  In  particular,  during  the  interactive 
decision  making,  CAD  system  can  still  provide  opinion 
when  the  cases  are  missed  by  the  localization  procedure 
but  presented  to  the  system  by  the  radiologists. 

2)  The  knowledge  database  is  defined  quantitatively  in  a 
high  dimensional  feature  space.  It  provides  not  only  the 
knowledge  for  training  the  machine  observer,  but  also  an 
objective  base  for  evaluating  the  quality  of  feature  extrac¬ 
tion  or  network’s  learning  capability,  and  the  on-line  vi¬ 
sual  explanation  possibility. 

3)  The  assignment  of  the  cases’  class  memberships  (e.g., 
mass  and  nonmass  classes)  is  supervised  by  the  radiolo¬ 
gists  or  pathological  reports.  A  complete  knowledge  data¬ 
base  includes  three  subsets:  raw  data  of  mass-like  sites, 
corresponding  feature  points,  and  class  membership  la¬ 
bels. 


El.  Data  Mapping  For  Decision  Making 

The  decision  making  support  by  a  CAD  system  addresses  the 
problem  of  mapping  a  knowledge  database,  given  a  finite  set 
of  data  examples.  The  mapping  function  can  therefore  be  inter¬ 
preted  as  a  quantitative  representation  of  the  knowledge  about 
the  mass  lesions  contained  in  the  database  [14].  Instead  of  map¬ 
ping  the  whole  data  set  using  a  single  complex  network,  it  is 
more  practical  to  design  a  set  of  simple  class  subnets  with  local 
mixture  clusters,  each  one  of  which  represents  a  specific  region 
of  the  knowledge  space.  Inspired  by  the  principle  of  divide-and- 
conquer  in  applied  statistics,  PMNN  has  become  increasingly 
popular  in  machine  learning  research  [14],  [15],  [19]-[22].  In 
this  section,  we  present  its  applications  to  the  problem  of  map¬ 
ping  from  databases  in  mass  detection,  with  a  constructive  cri¬ 
terion  for  designing  the  network  architecture  and  the  learning 
algorithm  that  are  governed  by  information  theory  [25]. 


A.  Statistical  Modeling 

The  quantitative  mapping  of  a  database  may  be  decomposed 
into  three  distinctive  learning  tasks:  the  detection  of  the  struc¬ 
ture  of  each  class  model  with  local  mixture  clusters;  the  estima¬ 
tion  of  the  data  distributions  for  each  induced  cluster  inside  each 
class;  and  the  classification  of  the  data  into  classes  that  realizes 
the  data  memberships.  Recently,  there  has  been  considerable 
success  in  using  finite  mixture  distributions  data  mapping  [15], 
[17],  [18],  [20],  Assume  that  the  data  points  X{  in  a  multidimen¬ 
sional  database  come  from  M  classes  {u/i,  . . . ,  ujri  . . . ,  }, 

and  each  class  contains  Kr  clusters  . . . ,  0*,  . . . ,  0kt  }, 
where  c ur  is  the  model  parameter  vector  of  class  r,  and  Ok  is  the 
kernel  parameter  vector  of  cluster  k  within  class  r.  The  class 
conditional  probability  measure  for  any  data  point  inside  the 
class  r,  i.e.,  the  standard  finite  mixture  distribution  (SFMD),  can 
be  obtained  as  a  sum  of  the  following  general  form: 

Kr 

f{u\Cjr)  -  7rjfc5(«|0fc)  (5) 

fc=i 

where  irk  =  P(9k\£r)  with  a  summation  equal  to  one,  and 
g(u\0k)  is  the  kernel  function  of  the  local  cluster  distribution. 
For  the  model  of  global  class  distributions,  we  denote  the 
Bayesian  prior  for  each  class  by  P(c3r).  Then  the  sufficient  sta¬ 
tistics  according  to  the  Bayes’  rule,  are  the  posterior  probability 
P(ur\xi)  given  a  particular  observation  Xi 


mm = ssimm 

p{Xi) 

where  p(xi)  =  P(ur)f(xi\wr). 


(6) 


B.  Class  Distribution  Learning 

Class  distribution  learning  addresses  the  combined  estima¬ 
tion  of  regional  parameters  (71*,  Ok)  and  detection  of  the  struc¬ 
tural  parameter  Kr  and  the  kernel  shape  of  g(>)  in  (5)  based  on 
the  observations  xr.  One  natural  criterion  used  for  learning  the 
optimal  parameter  values  is  to  minimize  the  distance  between 
the  SFMD,  denoted  by  /r(u),  and  the  class  data  histogram,  de¬ 
noted  by  f*r(u)  [17].  In  this  paper,  we  use  relative  entropy 
(Kullback-Leibler  distance),  suggested  by  information  theory 
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[25],  as  the  distance  measure  (for  simplicity  we  use  /r(u)  to 
denote  f(u\ur)  in  our  formulation),  given  by 

We  have  previously  shown  that  when  relative  entropy  is  used  as 
a  distance  measure,  the  distance  minimization  method  is  equiv¬ 
alent  to  the  soft-split  classification-based  method  under  the  cri¬ 
terion  of  maximum  likelihood  (ML)  [23]. 

Another  important  issue  concerning  unsupervised  distribu¬ 
tion  learning  is  the  detection  of  the  structural  parameters  of 
the  class  distribution,  called  model  selection  [15].  The  objec¬ 
tive  here  is  to  propose  a  systematic  strategy  for  determining  the 
optimal  number  and  kernel  shape  of  local  clusters,  when  the 
prior  knowledge  is  not  available.  This  is  indeed  the  case  when 
the  structure  of  the  mass  lesion  patterns  for  a  particular  type  of 
cancer  may  be  arbitrarily  complex,  so  correct  identification  of 
the  database  structure  is  very  important.  Thus,  it  will  be  desir¬ 
able  to  have  a  neural  network  structure  that  is  adaptive,  in  the 
sense  that  the  number  and  kernel  shape  of  local  clusters  are  not 
fixed  beforehand.  In  this  paper,  we  applied  two  popular  infor¬ 
mation  theoretic  criteria,  i.e.,  the  Akaike  information  criterion 
and  minimum  description  length  to  guide  the  model  selection 
procedure  [24]. 

As  the  counterpart  for  adaptive  model  selection,  there  are 
many  numerical  techniques  to  perform  ML  estimation  of  cluster 
parameters  [17].  For  example,  EM  algorithm  first  calculates  the 
posterior  Bayesian  probabilities  of  the  data  through  the  observa¬ 
tions  and  the  current  parameter  estimates  (i?-step)  and  then  up¬ 
dates  parameter  estimates  using  generalized  mean  ergodic  the¬ 
orems  (M-step).  The  procedure  cycles  back  and  forth  between 
these  two  steps.  The  successive  iterations  increase  the  likelihood 
of  the  model  parameters.  The  scheme  provides  winner-takes-in 
probability  (Bayesian  “soft”)  splits  of  the  data,  hencb  allowing 
the  data  to  contribute  simultaneously  to  multiple  clusters.  For 
the  sake  of  simplicity,  we  assume  the  kernel  shape  of  local  clus¬ 
ters  to  be  a  multidimensional  Gaussian  with  mean  /Zfcr  and  vari¬ 
ance  Tfcr.  We  summarize  the  EM  algorithm  as  follows. 

1)  E-Step:  for  training  sample  t  =  1,  . . . ,  AT,  compute 
the  probabilistic  membership 
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2)  M-Step:  compute  the  updated  parameter  estimates 
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C.  Decision  Boundary  Learning 

The  objective  of  data  classification  is  to  realize  the  class 
membership  Ur  for  each  data  points  based  on  the  observation 
Si  and  the  class  statistics  {P(u/r),  f(u\ur)}.  It  is  well  known 
that  the  optimal  data  classifier  is  the  Bayes  classifier  since 
it  can  achieve  the  minimum  rate  of  classification  error  [26]. 
Measuring  the  average  classification  error  by  the  mean  squared 
error  E ,  many  previous  researchers  have  shown  that  minimizing 
E  by  adjusting  the  parameters  of  class  statistics  is  equivalent  to 
directly  approximating  the  posterior  class  probabilities  when 
dealing  with  the  two  class  problem  [13],  [26],  In  general,  for  the 
multiple  class  problem  the  optimal  Bayes  classifier  (minimum 
average  error)  classifies  input  patterns  based  on  their  posterior 
probabilities:  input  is  classified  to  class  ur  if 

P(ur\xi)  >  P{ujj\xi)  (12) 

for  all  j  ^  r.  It  should  be  noted  that  in  the  formulation  of  classi¬ 
fier  design,  the  optimal  criterion  used  for  the  future  data  classi¬ 
fication  has  been  intuitively  and  directly  applied  to  the  learning 
of  class  statistics  from  the  training  data  set. 

Direct  learning  of  posterior  probability  is  a  complex  task. 
Great  effort  has  been  made  in  designing  the  classifier  as  an 
estimator  of  the  posterior  class  probability  [19].  By  closely  in¬ 
vestigating  the  global  class  distribution  modeling,  we  found  that 
the  classifier  design  for  data  classification  can  be  dramatically 
simplified  at  the  learning  stage.  Revisit  (6),  since  the  class  prior 
probability  P(3r)  is  a  known  parameter  when  a  supervised 
learning  is  applied,  the  posterior  class  probability  P(tSr\xi)  can' 
be  obtained  without  any  further  effort.  Thus,  by  conditioning 
P{uJr)y  the  problem  is  formulated  as  a  supervised  classification 
learning  of  the  class .  conditional  likelihood  density  f(u\ujr). 
Thus,  an  efficient  supervised  algorithm  to  learn  the  class 
conditional  likelihood  densities  called  the  “decision-based 
learning”  [21]  is  adopted  in  this  paper.  The  decision-based 
learning  algorithm  uses  the  misclassified  data  to  adjust  the  den¬ 
sity  functions  f(u\ujr)>  which  are  initially  obtained  using  the 
unsupervised  learning  scheme  described  previously,  so  that  the 
minimum  classification  error  can  be  achieved.  Define  the  rth 
class  discriminant  function  0r(x*,  w)  to  be  P(Sr)f(xi\ujr). 
Given  a  set  of  training  patterns  X  =  {£*;  £  =  1,2,...,  M}. 
The  set  X  is  further  divided  into  the  “positive  training  set” 
X+  =  { Xi ;  Xi  €  dfr,£  =  1,2,...,  N}  and  the  “negative 
training  set”  X“  =  {^;  £*  Pl  £  =  N+ 1,  iV  +  2,  . . . ,  M}. 
If  the  misclassified  training  pattern  is  from  positive  training 
set,  reinforced  learning  will  be  applied.  If  the  training  pattern 
belongs  to  the  negative  training  set,  we  anti-reinforce  the 
learning,  i.e.,  pull  the  kernels  away  from  the  problematic 
regions.  The  boundary  refinement  is  summarized  as  follows: 

Reinforced 

Learning:  wO'^1)  =  w(j)  +  ?7Z'(d(i))V0(x(£),  w) 

Antireinforced 

Learning:  —  77i/(d(t))V^(x(f),  w) 

(13) 

PMNN  is  a  probabilistic  modular  network  designed  espe¬ 
cially  for  data  classification  where  a  Bayesian  decomposition  of 
the  learning  process  provides  a  unique  opportunity  to  optimize 
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Fig.  4.  The  structure  of  the  PMNN. 

the  structure  of  training  scheme  [14],  [22].  Since  the  information 
about  class  population  is,  in  general,  physically  uncorrelated 
with  the  conditional  features  about  the  individual  class,  a  decou¬ 
pled  two-step  training,  in  terms  of  both  network  structure  and 
learning  rule,  makes  much  more  sense  than  that  in  the  conven¬ 
tional  posterior-type  neural  networks,  i.e.,  the  conditional  like¬ 
lihood  of  each  class  and  the  class  Bayesian  prior  should  be  ad¬ 
justed  separately  in  the  classification  spaces.  Thus,  PMNN  con¬ 
sists  of  several  disjoint  subnets  and  a  winner-takes-all  network. 
The  subnet  outputs  of  the  PMNN  are  designed  to  model  the  like¬ 
lihood  functions  (likelihood-type  network)  which  are  first  esti¬ 
mated  from  equally  presented  class  samples,  and  the  final  de¬ 
cision  boundaries  are  determined  simply  weighting  the  likeli¬ 
hood  by  the  class  populations.  For  a  M-classification  problem, 
PMNN  contains  M  different  class  subnets,  each  of  which  rep¬ 
resents  one  data  class  in  the  database.  Within  each  subnet,  sev¬ 
eral  neurons  (or  clusters)  are  applied  in  order  to  handle  prob¬ 
lems  which  have  complicated  decision  boundaries.  The  outputs 
of  class  subnets  are  fed  into  a  winner-take-all  network.  The 
winner-take-all  network  categorizes  the  input  pattern  to  the  data 
class  whose  subnet  produces  the  highest  output  value. 

The  structure  of  the  PMNN  used  in  this  study  is  shown  in 
Fig.  4.  The  PMNN  consists  of  two  subnets.  Within  each  subnet, 
there  are  several  neurons  (or  clusters).  The  outputs  of  class  sub¬ 
nets  are  fed  into  a  probability  winner  processor,  which  catego¬ 
rizes  the  input  pattern  to  the  data  class  whose  subnet  produces 
the  highest  probability  value.  The  training  scheme  of  the  PMNN 
is  based  on  the  unsupervised  learning.  Each  subnet  is  trained 
individually,  and  no  mutual  information  across  the  classes  may 
be  utilized.  In  our  study,  one  modular  expert  is  trained  to  de¬ 
tect  true  masses,  and  the  other  is  trained  to  detect  false  masses. 
After  training,  the  feature  vectors  extracted  from  ROIsub  are 
entered  to  this  network  to  classify  true  or  false  masses.  In  both 
training  and  testing  processes,  we  assume  that  the  feature  vec¬ 
tors  Xi  in  class  r  (r  =  1,  . . . ,  M)  is  a  mixture  of  multidimen¬ 
sional  Gaussian  distributions,  i.e., 

Kr 

f{Xi\QT)  =  TkrPk(Xi\c3r)  (14) 

k—1 


where  71  kr  =  1  and  Pk(ur)  =  N(jlkT ,  Tkr)  is  a  multi¬ 

dimensional  Gaussian  distribution  within  cluster  k  of  class  r. 

IV.  Interactive  Visual  Explanation 

In  order  to  improve  the  utility  of  the  CAD  systems  in  clinical 
practice,  an  IUI  is  highly  desired.  Different  from  many  previ¬ 
ously  proposed  approaches,  we  have  organized  our  database 
from  both  mathematical-localized  and  radiologist-selected 
mass-like  cases,  and  formed  the  featured  knowledge  database 
based  on  both  mathematical-based  and  radiologist-selected 
image  features.  This  off-line  effort  should  enhance  the  per¬ 
formance  of  the  machine  observer  through  better  quality  of 
training  set  and  optimal  design  of  neural  network  architecture. 
Our  experience  has  suggested,  however,  that  further  improve¬ 
ment  of  CAD.  systems  requires  on-line  natural  integration  of 
human  intelligence  with  the  computer’  output,  since  human 
perception  has  and  can  play  an  important  role  in  the  clinical 
decision  making.  In  this  research,  we  have  pilot  developed  an 
IUI  where  the  major  functions  include:  1)  interactive  visual 
explanation  of  the  CAD  decision  making  process;  2)  on-line 
retrieval  of  the  optimally  displayed  raw  data  and/or  similar 
cases;  and  3)  supervised  upgrade  of  the  knowledge  database  by 
radiologist-driven  input  of  the  “unseen”  and/or  “typical”  cases. 
Our  preliminary  studies  have  shown  that  the  visual  presentation 
of  both  raw  data  and  CAD  results  to  radiologists  may  provide 
visual  cues  for  improved  decision  making. 

As  a  step  toward  understanding  the  complex  information 
from  data  and  relationships,  structural  and  discriminative 
knowledge  reveals  insight  that  may  prove  useful  in  data 
mining.  Hierarchical  minimax  entropy  modeling  and  proba¬ 
bilistic  principal  component  projection  are  proposed  for  data 
explanation,  which  is  both  statistically  principled  and  visually 
effective  at  revealing  all  of  the  interesting  aspects  of  the  data 
set.  The  methods  involve  multiple  use  of  standard  finite  normal 
mixture  models  and  probabilistic  principal  component  projec¬ 
tions.  The  strategy  is  that  the  top-level  model  and  projection 
should  explain  the  entire  data  set,  best  revealing  the  presence 
of  clusters  and  relationships,  while  lower-level  models  and 


projections  should  display  internal  structure  within  individual 
clusters,  such  as  the  presence  of  subclusters  and  attribute  trends, 
which  might  not  be  apparent  in  the  higher-level  models  and 
projections.  With  many  complementary  mixture  models  and 
visualization  projections,  each  level  will  be  relatively  simple 
while  the  complete  hierarchy  maintains  overall  flexibility  yet 
still  conveys  considerable  structural  information.  In  particular, 
a  probabilistic  principal  component  neural  network  is  devel¬ 
oped  to  generate  optimal  projections,  leading  to  a  hierarchical 
visualization  algorithm.  This  algorithm  allows  the  complete 
data  set  to  be  analyzed  at  the  top  level,  with  best  separated 
subclusters  of  data  points  analyzed  at  deeper  levels. 

Research  evidence  suggests  that  for  analysis  of  complex  and 
high-dimensional  data  sets,  structure  decomposition  and  dimen¬ 
sionality  reduction  are  the  natural  strategies  in  which  the  model- 
based  approach  and  visual  explanation  have  proven  to  be  pow¬ 
erful  and  widely-applicable  [27].  However,  there  is  a  trade-off 

(dimensionality  reduction)  the  entropy  of  the  system.  In  this 
research,  a  minimax  entropy  approach  is  adopted  through  the 
use  of  progressive  model  identification  and  principal  compo¬ 
nent  projection.  The  complete  visual  explanation  hierarchy  is 
generated  by  performing  principal  projection  (dimensionality 
reduction)  and  model  identification  (structure  decomposition) 
in  two  iterative  steps  using  information  theoretic  criteria,  EM  al¬ 
gorithm,  and  probabilistic  principal  component  analysis  (PCA). 
Hierarchical  probabilistic  principal  component  visualization  in¬ 
volves:  1)  evaluation  of  posterior  probabilities  for  mixture  data 
set;  2)  estimation  of  multiple  principal  component  axes  from 
probabilistic  data  set;  and  3)  generation  of  a  complete  hierarchy 
of  visual  projections. 

Suppose  the  data  space  is  d-dimensional  with  coordinates 
Vi,  ■  ■  ■ ,  Vd  and  the  data  set  consists  of  a  set  of  d-dimensional 
vectors  {t»}  where  i  =  1,  . . . ,  N.  Now  consider  a  three-di¬ 
mensional  (3-D)  latent  space  x  =  (®1(  x2,  x3)T  together  with 
a  linear  function  which  maps  the  latent  space  to  the  data  space  by 
y  =  Wx+b  where  W  is  a  dx  3  matrix  and  bis  a  d-dimensional 
mean  vector.  If  we  introduce  a  probability  distribution  p(x)  over 
the  latent  space  given  by  a  Gaussian  estimated  from  the  latent 
variables  {xi},  then  a  similar  full-dimensional  Gaussian  distri¬ 
bution  in  data  space  can  be  defined  by  convolving  this  distri¬ 
bution  with  a  general  diagonal  Gaussian  conditional  probability 
distribution  p(t|x,  Aj)  in  data  space  where  Ad  is  the  covariance 
matrix,  resulting  in  a  final  form  of 

P(*)  =  J  p(t\x)p(x)dx  (15) 

where  the  log  likelihood  function  for  this  model  is  given  by  L  = 
Ei  Suppose  W  is  determined  by  the  PCA,  ML  can  be 

used  to  fit  the  model  to  the  data  and  hence  determine  values  for 
the  parameters  b  and  Ad  [27],  Using  a  soft  clustering  of  the  data 
set  and  multiple  PCAsub  corresponding  to  the  clusters,  a  mix¬ 
ture  of  latent  models  takes  the  form  of  p(t)  =  Ef=i  nkp(t\k) 
where  K0  is  the  number  of  components  in  the  mixture,  and  the 
parameters  xk  are  the  prior  probabilities  corresponding  to  the 
components  p(t|A;).  Each  component  is  an  independent  latent 
model  with  PCA  projection  W*.  and  parameters  bfc  and  Adk. 
This  procedure  can  be  further  extended  to  a  hierarchical  mix- 
ture  model  formulated  by  p(t)  =  £f=i  irk  J\  nj]kp(t\k,  j) 
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Fig.  5.  The  hierarchical  view  of  computed  features  for  mass  and  nonmass 
samples  (Database  A,  see  Table  II). 


where  p(t|A,  j)  again  represent  independent  latent  models  [27]. 
With  a  soft  partitioning  of  the  data  set  via  EM  algorithm,  data 
points  will  effectively  belong  to  more  than  one  cluster  at  any 
given  level.  This  step  is  automatically  available  in  our  approach 
since  the  estimation  of  parent  latent  model  involves  the  calcula¬ 
tion  of  posterior  probabilities  denoted  by  zik.  Thus,  the  effective 
input  values  are  Z{kXi  for  an  independent  visualization  space 
k,  corresponding  to  the  visualization  space  k  in  the  hierarchy. 
It  should  be  emphasized  that  probabilistic  means  both  neural 
network  based  learning  and  posterior  probability  weighted  in¬ 
puts.  Further  projections  can  again  be  performed  by  using  the 
effective  input  values  Zikz^kti  for  the  visualization  subspace 
j .  Fig.  5  shows  the  hierarchical  view  of  computed  features  for 
mass  and  nonmass  samples.  In  Fig.  5,  a  hierarchical  visualiza¬ 
tion  view  of  a  high  dimensional  feature  data  set  was  gener¬ 
ated  using  hierarchical  data  visualization  algorithm.  One  hun¬ 
dred  and  25  real  cases  were  involved,  among  them  75  are  mass 
sites,  50  are  nonmass  sites.  Nine  features  were  computed  on  125 
cases.  The  dimension  of  the  resulted  feature  data  set  became  125 
x  9  (Database  A,  see  Table  II).  Hierarchical  visualization  tool 
enables  the  visualization  of  high  dimensional  data  set  through 
dimension  reduction  and  data  modeling  so  that  data  distribution 
features  of  the  data  set  can  be  well  recognized.  For  instance,  the 
clusters  and  subclusters  of  mass  and  nonmass  data  points  and  the 
boundaries  of  the  clusters  can  be  revealed  for  further  research 
purpose. 

In  the  use  of  a  hierarchical  minimax  entropy  mixture  model, 
an  interactive  visualization  environment  is  required  to  enable  a 
flexible  computerized  experiment  such  that  a  human-database 
interaction  can  be  performed  effectively.  We  have  developed  an 
interactive  environment  for  visualizing  five-dimensional  (5-D) 
data  sets,  based  on  state-of-the-art  computer  graphics  toolkits 
such  as  object-oriented  OpenGL  and  Openlnventor.  With  a 
sophisticated  set  of  various  kinds  of  simulated  lights,  color 
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TABLE  n 

The  Summary  of  Experimental  Databases 

Database 

Descriptions 

A 

Nine  features  extracted  from  75  mass  sites  and  50  non-mass  sites.  Used  for 
visualizing  hierarchically  projected  high  dimensional  feature  space. 

Result  is  presented  in  Figure  5. 

B 

A  simulated  two-dimensional  feature  space.  Used  to  show  the  effect  of 
model  selection  on  decision  boundary  estimation.  Result  is  shown  in  Figure  6. 

C 

ORL  standard  database.  Used  to  show  the  improvement  of  PMNN  with 
decision-based  learning.  Result  is  discussed  in  the  text. 

D 

The  training  data  set  consisting  of  50  mammograms,  with  50  true  mass  sites 
and  50  false  mass  sites.  Three  most  discriminatory  features  are  extracted.  Used 
for  both  PMNN  training  and  visualization.  Result  is  given  in  Figure  7. 

E 

The  testing  data  set  consisting  of  46  mammograms,  with  23  normal  cases 
and  23  biopsy  proven  mass  cases  with  each  of  them  having  at  least  one 
true  mass  site.  Three  most  discriminatory  features,  the  same 
as  database  D,  are  extracted.  Used  to  test  the  overall  performance  of 
our  CAD  system  prototype  where  the  mass  candidates  were  selected  using 
the  method  reported  in  Part  I,  automatically.  Result 
is  shown  in  Figure  8  and  also  discussed  in  the  text. 

texturing  editors,  und  3-0  manipulator  and  viewers  (we  have 
integrated  3-0  mouse  and  stereo  glass  units  into  our  existing 
system),  our  system  allows  one  to  examine  the  volumetric 
data  sets  with  any  viewpoint  and  dynamically  walk  through  its 
internal  structures  to  better  understand  the  spatial  relationships 
among  clusters  and  decision  surfaces  present.  One  of  the  most 
important  features  in  our  approach  is  to  attach  the  decision  sur¬ 
face  to  the  3-D  probability  cloud  in  support  of  decision  making, 
and  to  link  each  data  point  in  the  visualization  space  to  its  raw 
data  so  that  the  user  can  on-line  retrieve  the  corresponding  raw 
data  such  as  an  original  image  for  interim  decision  making. 

V.  Experimental  Results  and  Discussions 

In  this  section,  we  present  the  experimental  results  using  the 
information  theoretic  criteria  and  PMNNs  to  generate  the  map¬ 
ping  function  of  the  featured  database,  and  the  preliminary  re¬ 
sults  using  the  hierarchical  minimax  entropy  projections  to  con¬ 
duct  visual  explanation  of  the  decision  making.  For  the  valida¬ 
tion  of  the  database  mapping  using  the  proposed  algorithms, 
global  relative  entropy  (GRE)  value  between  the  (SFMD)and 
the  joint  histogram  is  used  as  an  objective  measure  to  evaluate 
the  fitness  of  the  mapping  function.  A  summary  of  the  databases 
we  used  in  our  study  is  presented  in  Table  II. 

As  we  have  discussed  in  Sections  III  and  IV,  model  selection 
is  the  first  and  a  very  important  learning  task  in  mapping  a 
database  and  the  objective  of  the  procedure  is  to  determine 
both  the  number  and  the  kernel  shape  of  local  clusters  in  each 
class.  This  procedure  is  used  not  only  in  the  data  mapping  for 
decision  making  but  also  in  the  structure  decomposition  for 
hierarchical  visual  explanation.  Our  experience  has  suggested 
that  an  incorrect  model  selection  will  affect  the  performance 
of  data-classification  based  decision  making.  For  the  sake  of 
simplicity,  we  discuss  this  conclusion  in  the  following  2-D 
example.  Let  us  form  a  simulated  featured  database  with  two 
major  features  that  well  characterize  the  two  targeted  classes, 
as  it  shown  in  Fig.  6  (Database  B,  see  Table  II).  The  ground 
truth  is  that  class  1  contains  only  one  local  cluster  while  class  2 
contains  two  local  clusters.  With  a  model  selection  procedure 


using  the  proposed  criteria,  the  intrinsic  data  structure  was 
correctly  identified.  According  to  the  principle  of  designing  the 
optimal  structure  of  PMNN  and  visual  explanation  hierarchy, 
the  result  of  these  criteria  also  determines  the  most  appropriate 
number  of  mixture  components  in  the  corresponding  PMNN 
and  projected  cluster  decomposition.  Two  PMNN  with  different 
architecture  orders  were  designed  and  trained  to  determine 
the  classification  boundaries  between  the  two  classes.  The 
classification  results  are  shown  in  Fig.  6(a)  and  (b).  The  result 
in  Fig.  6(a)  is  with  the  right  cluster  number  in  Class  2,  while 
the  result  in  Fig.  6(b)  is  with  the  wrong  cluster  number  in 
Class  2.  From  this  simple  experiment,  we  have  shown  that 
the  decision  boundary  with  the  right  cluster  number  may  be 
much  more  accurate  than  that  with  heuristically  determined 
cluster  number,  since  the  decision  boundary  between  class  1 
and  class  2  will  be  determined  by  four  cross  points  in  the  first 
case  while  in  the  second  case  the  decision  boundary  will  be 
determined  by  only  two  cross  points.  It  should  be  emphasized 
that  the  error  of  data  classification  is  theoretically  controlled 
by  the  accuracy  in  estimating  the  decision  boundaries  between 
classes,  and  the  quality  of  the  boundary  estimates  is  indeed 
dependent  upon  the  correct  structure  of  the  class  likelihood 
function. 

As  we  have  discussed  before,  although  the  knowledge 
database  contains  both  machine-localized  and  human-selected 
cases,  in  clinical  settings  “unseen”  and/or  subtle  cases  con¬ 
tribute  the  major  false  positives.  We  have  also  pilot  tested  the 
PMNN  method  to  the  so-called  “ M  +  1  classes  problem, 
in  which  the  disease  pattern  under  testing  could  be  either 
from  one  of  the  M  classes,  or  from  some  other  unknown 
classes  (the  “unknown”  class  or  the  “intruder”  class).  Note  that 
the  unknown  class  probability  is  often  very  hard  to  estimate 
because  of  the  lack  of  sufficient  training  samples  (for  example, 
in  the  mass  detection  problem,  the  unknown  classes  include  the 
ROIsub  over  the  normal  tissues).  In  our  experiment,  PMNN 
uses  different  decision  rule  from  that  of  the  “M  classes 
problem:  pattern  £»  belongs  to  class  r  if  both  of  the  following 
conditions  are  true:  a)  <t>(6jr,  Xi)  >  xt),  Wj  r,  and  b) 

<f>(wr,  Si)  >  T.T  is  a  threshold  obtained  by  decision-based 
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(a)  '  '  (b) 


Fig.  6.  The  classification  examples  with  a  two-dimensional  (2-D)  simulated  database  (Dabase  B,  see  Table  II).  (a)  Class  2  contains  two  local  clusters,  (b)  Class 
2  contains  one  local  cluster.  •  *  *' 


learning.  Otherwise  pattern  x \  belongs  to  the  unknown  class. 
We  observed  consistent  and  significant  improvement  in  classifi¬ 
cation  results  compared  with  the  pure  Bayesian  decision.  Using 
the  ORL  (Olivetti  Research  Laboratory,  Cambridge,  U.K.) 
standard  database  (Database  C,  see  Table  II),  our  experience 
has  shown  an  increase  of  correct  detection  rate  fixup  70%  to 
90%  [14]. 

In  the  third  experiment,  we  use  the  proposed  classifier  to  dis¬ 
tinguish  true  masses  from  false  masses  based  on  the  features 
extracted  from  the  suspected  regions.  The  objective  is  to  reduce 
the  number  of  suspicious  regions  and  identify  the  true  masses. 
150  mammograms,  each  of  them  contains  at  least  one  mass 
case  of  varying  size  and  location,  were  selected  in  our  study. 
The  areas  of  suspicious  masses  were  identified  following  the 
proposed  procedure  with  biopsy  proven  results.  Fifty  mammo¬ 
grams  with  biopsy  proven  masses  were  selected  from  the  150 
mammograms  for  training  (Database  D,  see  Table  II),  The  mam¬ 
mogram  set  used  for  testing  contained  46  single- view  mam¬ 
mograms:  23  normal  cases  and  23  with  biopsy  proven  masses 
(Database  E,  see  Table  II)  which  were  also  selected  from  the  150 
mammograms.  All  mammograms  were  digitized  with  an  image 
resolution  of  100  pm  x  100  pm/pixel  by  the  laser  film  digitizer 
(Model:  Lumiscan  150).  The  image  sizes  are  1792  x  2560  x 
12  bpp.  For  this  study,  we  shrunk  the  digital  mammograms  with 
the  resolution  of  400  pm  by  averaging  4x4  pixels  into  one 
pixel.  According  to  radiologists,  the  size  of  the  small  masses  is 
3-15  mm.  The  middle  size  of  masses  is  15-30  mm.  The  large 
size  of  masses  is  30-50  mm,  which  are  rare  in  mammograms. 
A  3-mm  object  in  an  original  mammogram  occupies  30  pixels 
in  a  digitized  image  with  a  100-pm  resolution.  After  reducing 
the  image  size  by  four  times,  the  object  will  occupy  the  range 
of  about  seven  to  eight  pixels.  The  object  with  the  size  of  seven 
pixels  is  expected  to  be  detectable  by  any  computer  algorithm.  > 


Therefore,  the  shrinking  step  is  applicable  for  mass  cases  and 
can  save  computation  time. 

After  the  segmentation,  the  area  index  feature  was  first  used 
to  eliminate  the  nonmass  regions.  In  our  study,  we  set  A\  = 
7x7  pixels  and  A  2  =  75  x  75  pixels  as  the  thresholds.  Ax 
corresponds  to  the  smallest  size  of  masses  (3  mm),  and  an  ob¬ 
ject  with  a  area  of  75  x  75  pixels  corresponds  to  30  mm  in  the 
original  ,  mammogram.  This  indicates  that  the  scheme  can  de¬ 
tect  all  masses  with  sizes  up  to  30  mm.  Masses  larger  than  30 
mm  are  rare  cases  in  the  clinical  setting.  When  the  segmented 
region  satisfied  the  condition  Ai  <  A  <  A2,  the  region  was 
considered  to  be  suspicious  for  mass.  For  the  purpose  of  repre¬ 
sentative  demonstration,  we  have  selected  a  3-D  feature  space 
consisting  of  compactness  I,  compactness  II,  and  difference  en¬ 
tropy.  According  to  our  investigation,  these  three  features  have 
the  better  separation  (discrimination)  between  the  true  and  false 
mass  classes.  It  should  be  noticed  that  the  feature  vector  can 
easily  extend  to  higher  dimensionality.  A  training  feature  vector 
set  was  constructed  from  50  true  mass  ROIsub  and  50  false  mass 
ROIsub  (Database  D,  see  Table  II).  The  training  set  was  used  to 
train  two  modular  probabfiistic  decision-based  neural  networks 
separately.  In  addition  to  the  decision  boundaries  recommended 
by  the  computer  algorithms,  a  visual  explanation  interface  has 
also  been  integrated  with  3-D  to  2-D  hierarchical  projections. 
Fig.  7(a)  shows  the  database  map  projection  with  compactness 
definition  I  and  difference  entropy.  Fig.  7(b)  shows  the  data¬ 
base  map  projection  with  compactness  definition  II  and  differ¬ 
ence  entropy.  Our  experience  has  suggested  that  the  recogni¬ 
tion  rate  with  compactness  I  are  more  reliable  than  that  with 
compactness  II.  In  order  to  have  more  accurate  texture  informa¬ 
tion,  the  computation  of  the  second-order  joint  probability  ma¬ 
trix  pd7  $  (z,  j )  is  only  based  on  the  segmented  region  of  the  orig¬ 
inal  mammogram.  For  the  shrunk  mammograms,  we  found  that 
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Fig.  7.  The  data  mapping  results  (Database  D,  see  Table  II).  -o-  denotes  true  mass  cases;  denotes  false  mass  cases,  (a)  The  mapping  using  compactness  I.  (b) 
The  mapping  using  compactness  II. 


Fig.  8.  One  example  of  the  mass  detection  using  the  proposed  approach  (Database  E,  see  Table  II). 


the  difference  entropy  had  better  discrimination  with  d  =  1 .  The 
difference  entropy  used  in  this  study  was  the  average  of  values 
at  9  =0°,  45°,  90°,  and  135°. 

We  have  conducted  a  preliminary  study  to  evaluate  the  per¬ 
formance  of  the  algorithms  in  real  case  detection,  in  which  6-15 
suspected  masses/mammogram  were  detected  and  required  fur¬ 
ther  clinical  decision  making.  We  found  that  the  proposed  clas¬ 
sifier  can  reduce  the  number  of  suspicious  masses  with  a  sensi¬ 
tivity  of  84%  at  1.6  false  positive  findings/mammogram  based 
on  the  testing  data  set  containing  46  mammograms  (23  of  them 


have  biopsy  proven  masses)  (Database  E,  see  Table  II).  Fig.  8 
shows  a  representative  mass  detection  result  on  one  mammo¬ 
gram  with  a  stellate  mass.  After  the  enhancement,  ten  regions 
with  brightest  intensity  were  segmented.  Using  the  area  crite¬ 
rion,  too  large  and  too  small  regions  were  eliminated  first  and 
the  rest  regions  were  submitted  to  the  PMNN  for  further  eval¬ 
uation.  The  results  indicated  that  the  stellate  mass  lesion  was 
correctly  detected. 

For  further  evaluation,  receiver  operating  characteristic 
(ROC)  method  may  be  employed.  However,  we  do  not  feel 
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ROC  analysis  will  provide  really  a  better  evaluation  but  an 
alternative  method  to  this  case.  First,  most  ROC  analysis 
reported  by  others  were  based  on  different  database  thus  are 
not  comparable  since  ROC  results  are  highly  data-dependent. 
Second,  ROC  analysis  only  indicate  an  “overall”  performance 
with  limitations  at  least  in  twofold:  it  is  for  multithreshold  thus 
the  corresponding  system  may  not  be  optimal  to  a  particular 
application  where  only  one  threshold  is  needed;  and  it  cannot 
provide  a  mathematically  traceable  feedback  to  improve  the 
performance  of  the  system  or  the  one  component  in  the  system. 
Third,  currently  used  FROC  analysis  package  imposes  several 
assumptions  on  the  distributions  of  the  cases  which  are  invalid 
in  most  applications  and  particularly  untrue  in  our  situation.  For 
example,  our  assumptions  about  the  data  distributions  is  SFNM 
that  is  clearly  different  from  the  restricted  conditions  imposed 
by  the  application  of  existing  FROC  analysis  algorithm.  In  our 
approach,  a  quantitative  mapping  of  the  knowledge  database 
is  performed  with  hierarchical  SFMD  modeling  and  should 
be  perfectly  (at  least  in  the  theoretical  sense)  carried  out  by 
the  corresponding  PMNN  classifier.  In  other  words,  optimal 
decision  making  should  have  already  been  achieved  according 
to  the  Bayesian  rule.  It  is  reasonable  to  acknowledge  that 
in  order  to  compare  the  overall  performance  with  the  other 
systems,  an  ROC  study  may  be  further  conducted.  We  are 
currently  working  on  developing  a  new  generation  of  FROC 
analysis  package  with  a  caution  to  remove  the  forementioned 
problems. 

Another  important  consideration  with  the  present  approach 
is  the  measure  of  quality  in  visual  explanation  [29].  This  is  not 
a  glamorous  area,  but  progress  in  this  area  is  eminently  critical 
to  the  future  success  of  visual  exploration  [28].  What  is  the  cor¬ 
rect  matrix  for  a  direct  projection  of  a  particular  multimodal  data 
set?  How  effective  was  a  particular  visualization  tool?  Did  the 
user  come  to  the  correct  conclusion?  It  may  be  agreeable  that 
the  benchmark  criteria  in  visual  exploration  are  very  different 
and  difficult  [28].  As  shared  by  Bishop  and  Tipping  [27],  we 
believe  that  in  data  visualization  there  is  no  objective  measure 
of  quality,  and  so  it  is  difficult  to  quantify  the  merit  of  a  partic¬ 
ular  data  visualization  technique,  and  the  effectiveness  of  such 
a  techniques  is  often  highly  data-dependent.  The  possible  alter¬ 
native  is  to  perform  a  rigorous  psychological  evaluation  using 
simple  and  controlled  environment,  or  to  invite  domain  experts 
to  direct  evaluate  the  efficacy  of  the  algorithm  for  a  specified 
task.  For  example,  we  can  compare  the  domain  expert’s  perfor¬ 
mances  with  and  without  the  system  aid.  In  that  case,  the  ROC 
method  may  be  used  to  evaluate  the  performance  of  our  algo¬ 
rithm  when  used  by  the  radiologists.  While  the  optimality  of 
these  new  techniques  is  often  highly  data-dependent,  we  would 
expect  the  hierarchical  visualization  model  to  be  a  very  effective 
tool  for  the  data  visualization  and  exploration  in  many  applica¬ 
tions. 

In  summary,  we  employed  a  mathematical  feature  extraction 
procedure  to  construct  the  featured  knowledge  database  from 
all  the  suspicious  mass  sites  localized  by  the  enhanced  segmen¬ 
tation.  The  optimal  mapping  of  the  data  points  was  then  ob¬ 
tained  by  learning  the  generalized  normal  mixtures  and  decision 
boundaries.  A  visual  explanation  of  the  decision  making  was 
further  invented  as  a  decision  support,  based  on  an  interactive 


visualization  hierarchy  through  the  probabilistic  principal  com¬ 
ponent  projections  of  the  knowledge  database  and  the  localized 
optimal  displays  of  the  retrieved  raw  data.  A  prototype  system 
was  developed  and  pilot  tested  to  demonstrate  the  applicability 
of  this  framework  to  mammographic  mass  detection. 
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Abstract — Quantitative  analysis  of  magnetic  resonance  (MR) 
images  is  a  powerful  tool  for  image-guided  diagnosis,  monitoring, 
and  intervention.  The  major  tasks  involve  tissue  quantification  and 
image  segmentation  where  both  the  pixel  and  context  images  are 
considered.  To  extract  clinically  useful  information  from  images 
that  might  be  lacking  in  prior  knowledge,  we  introduce  an  unsu¬ 
pervised  tissue  characterization  algorithm  that  is  both  statistically 
principled  and  patient  specific.  The  method  uses  adaptive  standard 
finite  normal  mixture  and  inhomogeneous  Markov  random  field 
models,  whose  parameters  are  estimated  using  expectation-max¬ 
imization  and  relaxation  labeling  algorithms  under  information 
theoretic  criteria.  We  demonstrate  the  successful  applications  of 
the  approach  with  synthetic  data  sets  and  then  with  real  MR  brain 
images. 

Index  Terms — Finite  normal  mixture,  image  segmentation,  in¬ 
formation  theoretic  criteria,  patient  site  model,  tissue  quantifica¬ 
tion. 


I.  Introduction 

UANTITATIVE  analysis  of  magnetic  resonance  (MR) 
images  refers  to  the  problem  of  estimating  tissue  quanti¬ 
ties  and  segmenting  the  anatomy  into  contiguous  regions 
of  interest.  The  problem  has  recently  received  much  attention 
largely  due  to  the  improved  fidelity  and  resolution  of  MR 
imaging  systems,  and  the  effective  clinical  utility  of  image 
analysis  and  understanding  in  diagnosis,  monitoring,  and 
intervention  [l]-[4].  For  example,  pathological  studies  show 
that  many  neurological  diseases  are  accompanied  by  subtle 
abnormal  changes  in  brain-tissue  quantities  and  volumes  [2], 
Due  to  the  virtual  impossibility  for  clinicians  to  quantitatively 
analyze  these  pathological  changes  associated  with  specific  dis¬ 
ease  directly  from  MR  images,  considerable  effort  is  required 
to  develop  accurate  image  analysis  algorithms  for  identifying 
and  quantifying  these  changes  in  vivo  [5]— [10],  [12]. 

The  major  tasks  of  MR  image  analysis  involve  tissue  quantifi¬ 
cation  and  image  segmentation.  When  a  stochastic  model-based 
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method  is  adopted,  both  pixel  and  context  images  should  be  con¬ 
sidered.  In  this  paper,  we  introduce  an  unsupervised  image  anal¬ 
ysis  procedure,  which  is  both  statistically  principled  and  patient 
specific,  a  feature  especially  important  in  cases  with  limited  or 
no  prior  knowledge  [4],  [10].  We  assume  that  the  MR  images 
are  single  valued  and  the  anatomy  of  the  site  may  contain  ab¬ 
normalities.  The  method  involves  adaptive  use  of  standard  finite 
normal  mixture  (SFNM)  and  inhomogeneous  Markov  random 
field  (MRF)  models,  whose  parameters  are  estimated  using  the 
fast  expectation-maximization  (EM)  and  modified  iterated  con¬ 
ditional  modes  (MICM)  algorithms  under  a  selected  informa¬ 
tion  theoretic  criterion  [15]— [17].  The  major  difference  of  our 
study  from  previous  research  in  the  area  [l]-[3],  [5],  [10]  is  as 
follows. 

1)  We  introduce  an  adaptive  SFNM  to  model  the  pixel 
images,  where  we  use  information  theoretic  criteria  to 
determine  the  number  of  tissue  types  in  a  given  image 
and  use  the  boundary-associated  pixels  to  fine  tune  the 
model  parameter  values.  This  allows  incorporation  of 
partial  volume  effect  and  tissue  abnormality  into  a  single 
stochastic  model  [3],  [10]. 

2)  We  introduce  an  inhomogeneous  MRF  to  model  the  con¬ 
text  images,  where  we  use  the  entropy  rate  distribution  to 
determine  probabilistic  boundary  sites  in  the  image  and 
the  spatial  discontinuity  levels  to  assign  the  model  param¬ 
eter  values.  This  permits  an  adaptive  construction  of  the 
patient  specific  site  model  for  improving  both  tissue  quan¬ 
tification  and  image  segmentation  [23],  [28]. 

3)  We  introduce  post-global  relative  entropy  (GRE)  for  the 
assessment  of  the  final  image-segmentation  performance. 
Based  on  the  observation  that  the  parameter  values  of  a 
particular  tissue  type  in  the  quantified  and  segmented  im¬ 
ages  should  be  very  close,  this  criterion  suggests  an  in¬ 
direct,  but  objective  approach  for  the  difficult  problem  of 
performance  evaluation  in  image  segmentation  [31]. 

4)  We  introduce  a  fast  EM  formulation,  which  is  histogram 
based  rather  than  concentionally  raw  data  driven.  Exper¬ 
iments  show  its  very  efficient  performance. 

In  Section  II,  we  discuss  the  stochastic  modeling  of  both  pixel 
and  context  images,  and  in  Section  HI,  we  describe  the  methods 
for  tissue  quantification  and  image  segmentation.  This  is  further 
extended  to  demonstrate  the  accuracy  and  reproducibility  of  the 
algorithms  in  Section  IV,  where  we  first  illustrate  the  operation 
of  the  algorithm  along  with  the  results  using  simulated  data  sets, 
and  then  apply  the  algorithm  to  real  MR  brain  images.  Finally, 
relationship  of  our  method  to  other  approaches  and  future  work 
are  discussed  in  Section  V. 
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n.  Image  Modeling 

In  order  to  validate  the  use  of  a  suitable  stochastic  model  for 
MR  image  analysis  with  a  specified  objective,  we  have  studied 
MR  imaging  statistics  and  observed  several  useful  statistical 
properties  of  MR  images  [4].  These  results  are  strongly  sup¬ 
ported  by  the  analysis  of  actual  MR  image  data  [18].  In  particular, 
based  on  the  statistical  properties  of  MR  pixel  images,  where 
pixel  image  is  defined  as  the  observed  gray  level  associated 
with  the  pixel,  use  of  an  SF3STM  distribution  is  justified  to  model 
the  image  histogram,  and  it  is  shown  that  the  SFNM  model 
converges  to  the  true  distribution  when  the  pixel  images  are 
asymptotically  independent  [19].  Furthermore,  by  incorporating 
statistical  properties  of  context  images,  where  context  image  is 
defined  as  the  membership  of  the  pixel  associated  with  different 
tissue  types,  we  have  proposed  an  inhomogeneous  MRF  formu¬ 
lation  to  impose  local  consistency  constraints  on  context  images 
in  terms  of  a  stochastic  regularization  scheme  [20]. 

Assume  that  each  pixel  i  can  be  decomposed  into  pixel  image  X{ 
and  context  image  k  with  i  =  1,  . . . ,  N  [23].  By  ignoring  infor¬ 
mation  regarding  the  spatial  ordering  of  pixels,  we  can  treat  con¬ 
text  images  (i.e.,  pixel  labels)  as  random  variables  and  describe 
them  using  a  multinomial  distribution  with  unknown  parameters 
7Tfc,  where  7Tfc  =  1,0  <  'Kk  <  1.  Thus,  the  sufficient  statis¬ 
tics  are  the  pixel  image  statistics  and  the  number  of  pixels  for  each 
component  [15].  The  marginal  probability  measure  for  any  pixel 
image  can  be  obtained  by  writing  the  joint  probability  density 
of  Xi  and  k,  and  then  summing  the  joint  density  over  all  possible 
outcomes  of  k,  resulting  in  a  sum  of  the  following  general  form 
applicable  to  all  pixels,  according  to  the  Bayes  law 


/(*)■ '=  y>  ~7== —  exp 
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where  pk  and  <r\  are  the  mean  and  variance  of  the  fcth  Gaussian 
kernel.  We  use  K  to  denote  the  number  of  Gaussian  components 
and  r  £  TZSK~l  to  denote  the  model  parameter  vector  that  in¬ 
cludes  /Life,  and  i r*.  for  all  K  components  (i.e.,  the  tissue  types). 
It  is  important  to  note  that  we  consider  the  structural  parameter 
Kin  (1)  to  be  adjustable,  not  a  fixed  constant  as  assumed  in  most 
previous  work  [2],  [3],  [5].  We  believe  that  this  is  an  important 
consideration  in  unsupervised  image  analysis,  better  repre¬ 
senting  the  cases  encountered  in  the  real  clinical  environment. 
For  example,  when  prior  knowledge  regarding  partial  volume 
effect  or  tissue  abnormality  is  not  available  for  a  given  image,  our 
proposed  image  model  will  adapt  its  structure  to  include  all  of  the 
distinct  components  in  the  image  [10],  [13],  [14]. 

In  MR  images,  regions  are  piecewise  smooth,  thus,  pixel 
label  takes  on  discrete  values  and  the  labels  of  nearby  pixels 
are  strongly  correlated.  Using  the  equivalence  between  a 
Gibbs  distribution  and  an  MRF,  it  has  been  shown  that  Gibbs 
distribution  provides  a  joint  probability  measure  for  context 
images  1  in  the  following  form  [2],  [10],  [11],  [23],  [24]: 


where  tf(l)  is  the  energy  function,  and  the  normalizing  con¬ 
stant  Z\  is  the  partition  function.  A  neighborhood  system  can 
be  established  by  specifying  the  clique  function  Uc^(l),  where 
tf(I)  =  YliLi  Vc  (!)•  The  most  typical  configuration  of  the 
MRF  model  is  the  pairwise  interaction  neighborhood  system  in 
which  spatial  neighbors  occur  only  in  pairs,  i.e.,  the  second- 
order  model  [23].  We  define  the  neighborhood  of  pixel  z,  de¬ 
noted  by  di ,  by  opening  a  3  x  3  window  with  pixel  i  being  the 
central  pixel.  The  energy  function  can  then  be  written  as 


N  c 


^)  =  EENUi)] 

i= 1  j€.di 


(3) 


where  {6i }  is  the  Markov  parameter,  c  denotes  clique  with  c  =  4 
for  a  second-order  model,  and /(.,  .)  is  the  indicator  function  [4] 


if  a  =  b 
if  a^b. 


(4) 


By  translating  local  context  information  into  the  energy  func¬ 
tion  of  the  Gibbs  measure,  through  clique  structure  or  Markov 
parameter,  an  inhomogeneous  MRF  model  can  be  established. 
Again,  important  to  note  that,  in  our  formulation,  the  Markov 
parameter  di  in  (3)  is  considered  to  be  shift  variant,  not  a  fixed 
constant,  as  in  most  previous  work  [2],  [10].  We  believe  that 
the  correlation  among  context  images  is  primarily  local  and 
should  be  reflected  by  the  Markov  parameter  values.  For  ex¬ 
ample,  within  one  region,  the  context  exhibits  strong  spatial  con¬ 
tinuity,  while  on  region  boundaries,  the  context  reflects  the  nat¬ 
ural  discontinuity. 

Simply  applying  the  Bayes  law,  we  can  construct  a  unified 
framework  to  integrate  the  pixel  image  model  with  the  context 
image  model.  It  has  been  shown  that  requiring  the  conditional 
independence  of  the  observed  pixel  images,  given  the  context 
images,  is  sufficient  to  ensure  that  the  posterior  distribution  is 
an  MRF  [21],  [11],  [22]— [24] .  That  is,  a  posterior  conditional 
probability  function  for  the  context  images  1,  given  the  observed 
pixel  images  x,  has  the  form  of  a  Gibbs  distribution  given  by 


P(!|x)  =  j—  exP(— P(l|x))  (5) 

Zl|x 

where  Z\ |x  is  a  normalizing  constant.  Based  on  the  pairwise  in¬ 
teraction  neighborhood  system,  the  corresponding  energy  func¬ 
tion  is 


u(  l|x)  = 
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where  we  have  used  (1)  and  (3). 

Equation  (5)  refers  to  a  hidden  MRF,  where  the  local  property 
can  be  derived  as 


P(l)  =  i- exp  (-£/(!)) 


p(Zi|lai,  x)  =  -L  exp  (-U(li\ldi,  x)) 


(2) 
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where  Zi  is  a  normalizing  constant,  ldi  are  local  context  images, 
and 

U(k\l8i,x)=  fill  H*l)  + 

k=  1 

+E  (8) 

j&di  ■ 

The  spatial  statistical  dependence  among  pixel  images  is  one 
of  the  fundamental  issues  in  the  problem  formulation  for  image 
analysis.  In  our  approach,  we  tackle  the  problem  as  follows.  For 
the  purpose  of  tissue  quantification,  maximum  likelihood  (ML) 
estimation  is  used  based  on  the  SFNM  model  given  in  (1).  In 
[19],  we  prove  a  convergence  theorem  showing  that,  when  the 
pixel  images  are  asymptotically  independent,  the  parameter  esti¬ 
mates  based  on  (1 )  converge  to.  their  true  values  with  probability 
one  for  sufficiently  large  N.  For  the  purpose  of  image  segmen¬ 
tation,  a  maximum  a  posterior  probability  (MAP)  approach  is 
employed  to  update  1  according  to  (5).  Since  in  the  formulation 
we  allow  9i  to  be  adjustable  [21],  the  assignment  of  their  values 
will  incorporate  the  unified  correlation  among  both  the  pixel  and 
context  images.  Our  tests  with  a  wide  class  of  simulated  and  real 
data  have  demonstrated  the  plausibility  of  the  approach  [32] .  We 
include  some  of  the  examples  in  Section  IV. 


III.  Method 

A.  Tissue  Quantification 

Tissue  quantification  addresses  the  combined  estimation  of 
regional  parameters  (7Tfc,  /ifc,  <r\)  and  the  detection  of  the  struc¬ 
tural  parameter  K  in  (1)  given  the  pixel  images  x.  The  two 
main  approaches  are  based  on  classification-estimation  and  dis¬ 
tance-minimization  algorithms  [2],  [3],  [6],  [10].  In  the  classi¬ 
fication-estimation  approach,  all  pixels  are  first  classified  into 
different  tissue  types,  and  the  model  parameters  are  then  esti¬ 
mated  using  the  sample  averages.  In  the  distance-minimization 
approach,  the  mixture  density  is  fitted  to  the  histogram  of  pixel 
images  by  finding  the  optimal  parameters.  We  use  relative  en¬ 
tropy  (the  Kullback-Leibler  distance)  for  tissue  quantification, 
a  measure  of  the  information  theoretic  distance  between  the  his¬ 
togram  of  the  pixel  images  /x,  and  the  estimated  SFNM  distri¬ 
bution  f(x),  [25] 

II/)  =  !>(*)  log  (9) 

We  have  shown  that,  when  relative  entropy  is  used  as  the  dis¬ 
tance  measure,  distance  minimization  is  equivalent  to  the  ML 
estimation  of  SFNM  parameters  under  a  pixel  independency  ap¬ 
proximation,  i.e.,£(r)  =  exp(-V[£f(/x)  +  D(/x||/)]),  where 
H  is  the  entropy  calculator  [19]. 

Thus,  the  ML  estimate  can  be  reached  by  minimizing 
D(fx\\f),  e.g.. 


dP(fx\\f) 

<9(71-*,  nk,  Ck) 


(10) 


which  motivates  the  consideration  of  a  fast  implementation  of 
the  EM  algorithm  based  on  the  histogram  /x  (x)  rather  than  the 


raw  data  x,  as  formulated  in  [10]  and  [13].  We  define  the  his¬ 
togram  of  the  data  set  as  the  number  of  times  the  value  ui  occurs 
in  the  observation  x 

N 

/*(ui)  =  EJ(x<’  wj)  on 

for  alii  =  1,  . . . ,  L  with  Yf=  1  /x(u0  =  -AT.  The  new  for¬ 
mulation  is  derived  by  augmenting  the  observed  data  histogram 
fx(ui)  by  splitting  the  value  of  fx(u[)  at  each  level  l  into  K0 
fractions  of  latent  or  unobserved  data  based  on  current  estimates 
via  posterior  Bayes  probability  zik,  as  we  now  sketch  the  fol¬ 
lowing  steps:. 


E-Step 


M-Step 


c)  K-) .  °!(n|) 

%  /(“-K0 , <4n>. <'fn)) 
1  L 

■  (n+ 1)  __  1  M  r  /  \ 

~  at  /  ,Zlk  fx(ul) 


4"+,,=^ 


2(n+l)  _  Z=1 


E4*Vx(«z)( 


«i  -/4n+i)y 


E4n)/x(«o 


(12) 
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(15) 


Although  the  result  expressed  by  (12)— (15)  may  be  intuitively 
obvious,  i.e.,  by  simply  applying  the  expectation  laws  or  reorga¬ 
nizing  conventional  formulation  of  the  EM  algorithm,  it  is  also 
a  formal  result  that  follows  from  the  philosophy  of  the  EM  algo¬ 
rithm  presented  in  [15]  with  a  new  objective  function  D(/x||/). 
The  algorithm  first  calculates  the  posterior  Bayesian  probabili¬ 
ties  of  the  data  through  the  observations  and  the  current  param¬ 
eter  estimates  (i?-step)  and  then  updates  parameter  estimates 
using  generalized  mean  ergodic  theorems  (M-step).  The  pro¬ 
cedure  cycles  back  and  forth  between  these  two  steps  until  it 
reaches  a  stationary  point  of  the  likelihood,  where  n  is  the  iter¬ 
ation  index.  In  this  application,  we  have  experienced  L  <  N> 
thus,  the  new  algorithm  will  requires  O(L)  rather  than  O(N) 
operations  per  iteration.  For  example,  for  most  MR  images,  we 
have  L  =  256  ~  4096  and  N  =  65  536  -  262 144. 

For  an  unsupervised  tissue  quantification  process,  the  number 
of  tissue  types  should  be  detected  as  part  of  the  parameter  es¬ 
timation  procedure  prior  to  image  segmentation.  This  step  is 
called  model  selection,  which  is  particularly  critical  in  clinical 
applications  where  the  abnormality  shown  in  the  images  for  a 
particular  patient  may  be  arbitrarily  complex.  One  approach  to 
determine  the  optimal  number  Kq  is  to  use  information  theoretic 
criteria,  such  as  the  Akaike  information  criterion  (AIC)  [26], 
and  the  minimum  description  length  (MDL)  [27].  The  major 
thrust  of  this  approach  has  been  the  formulation  of  an  adaptive 
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process  in  which  a  model  fitting  procedure  is  utilized  to  select  a 
model  from  several  competing  candidates  such  that  the  selected 
model  best  fits  the  observed  data. 

For  example,  AIC  will  select  the  model  that  gives  the  min¬ 
imum  of 

AIC(J<ra)  =  -2  log  (£(fML))  +  2 Ka  (16) 

where  £(Fml)  is  the  likelihood  of  tml.  the  ML  parameter  es- 
timates,  and  Ka  is  the  number  of  free  adjustable  parameters  in 
the  model.  From  a  quite  different  point-of-view,  MDL  reformu¬ 
lates  the  problem  explicitly  as  an  information  coding  problem, 
in  which  the  best  model  fit  is  measured  such  that  high  probabil¬ 
ities  are  assigned  to  the  observed  data,  while  at  the  same  time, 
the  model  itself  is  not  too  complex  to  describe.  The  model  is 
selected  by  minimizing  the  total  description  length  defined  by 

MDL (Ka)  =  -  log  (£(rML))  +  0.51^0  log  N.  (17) 

Note  that,  different  from  AIC,  the  second  term  in  MDL  takes 
into  account  the  number  of  observations. 

B.  Image  Segmentation 

Image  segmentation  is  a  technique  for  partitioning  the  image 
into  meaningful  regions  corresponding  to  different  objects.  It 
may  be  considered  to  be  a  clustering  process  where  the  pixels 
are  classified  into  attributed  tissue  types  according  to  their 
gray-level  values  and  spatial  correlation.  More  precisely,  image 
segmentation  addresses  the  realization  of  context  images  1, 
given  the  observed  pixel  images  x  [17].  Based  on  the  inhomo¬ 
geneous  MRF  model  given  by  (5),  there  are  several  approaches 
for  performing  the  pixel  classification.  For  example,  we  can 
use  ML  classification  to  directly  maximize  the  individual 
likelihood  function  of  the  pixel  images,  i.e.,  the  first  term  in 
(5),  by  searching  the  optimum  1  where  the  true  pixel  labels  1* 
are  considered  to  be  functionally  independent  constants  [17]. 
The  major  problem  associated  with  this  approach  is  that  the 
classification  error  is  high  when  the  observed  pixel  images  are 
noisy.  However,  it  may  well  function  as  an  initial  solution  since 
the  classification  error  is  spatially  uniformly  distributed.  By 
considering  both  terms  in  (8),  we  derive  the  MICM  algorithm 
to  search  for  a  MAP  image  segmentation.  The  structure  of 
this  relaxation  labeling  procedure  is  based  on  the  following 
two  basic  considerations:  1)  decomposition  of  the  global 
computation  scheme  into  a  network  performing  simple  local 
computations  and  2)  use  of  suitable  local  context  regularities 
in  resolving  ambiguities  [17]. 

Let  e  denote  the  expected  segmentation  error,  i.e.,  the  poste¬ 
rior  cost  of  misclassification  given  pixel  images  x,  based  on  the 
following  equivalence: 

arg|maxP(/i  =  k,  ls|*lx)|  =  arg  jmaxp(Zi  =  k\lS\i,  x)  j 

(18) 

where  l5ji  denotes  pixel  labels  of  all  pixels,  except  pixel  L  Ref¬ 
erences  [22]— [24]  show  that  choosing  the  labeling  that  mini¬ 
mizes  e  is  equivalent  to  maximizing  the  marginal  posterior  dis¬ 
tribution  such  that  label  k  is  updated  to  satisfy 

p(4m+1)|x,  lS|i)  >  p(l\m) |x,  lS|t)  (19) 


for  all  pixels.  Furthermore,  by  imposing  the  Markovian  con¬ 
straint,  we  have  the  following  relationship  [22],  [23]: 

ls|<)  oc  p(zi|(i)p(ii|lai)  oc ,p(k  =  k\xi,  l9i).  (20) 

That  is, 

arg {maxp(Zj|x,  lS|i)|  =  arg|maxp(a:i|ii)p((i|lai)| 

=  arg  jmaxp^lxi,  la.)  j  .  (21) 

Hence,  based  on  the  inhomogeneous  hidden  MRF  formula¬ 
tion,  MICM  algorithm  is  constructed  as  a  computationally  fea¬ 
sible  alternative  to  obtain  an  MAP  solution.  From  (7),  we  can 
show  that  maximizing  the  conditional  probability  is  equivalent 
to  minimizing  the  energy  function  U(U\\di,  x).  That  is,  pixel  i 
will  be  classified  into  the  fcth  region  if 

U  =  arg  |  min  U (U  =  fc|lai ,  x)  j  .  (22) 

The  MICM  algorithm  uses  a  specified  number  of  iterations  by 
randomly  visiting  all  pixels.  At  each  single  cycle  in  the  itera¬ 
tions,  for  the  update  of  h,  we  have 

P(l|x)  =p(Zi|lS|i,  x)P(lS|i|x)  (23) 

which  never  decreases,  hence,  assuring  eventual  convergence 

[22] ,  [23].  This  approach  demonstrates  how  a  network  of  dis¬ 
crete  units  can  be  used  to  search  an  optimal  solution  in  a  problem 
where  incorporation  of  context  constraints  is  important. 

To  perform  image  segmentation  based  on  the  inhomogeneous 
MRF  model  requires  assignment  of  the  Markov  parameter  0; 

[23] .  An  important  area  in  image  analysis  that  recently  started 
to  be  emphasized  is  the  use  of  patient-specific  site  models  to 
guide  various  medical  image  analysis  tasks  [28].  Hence,  in  the 
assignment  of  the  Markov  parameter,  we  first  construct  a  pa¬ 
tient-specific  probabilistic  site  model  directly  from  the  interim 
context  images  (e.g.,  after  ML  classification),  where  the  entropy 
rate  distribution  is  used  as  a  measure  of  the  pixel  label  depen¬ 
dence,  and  then  assign  the  Markov  parameter  values  based  on 
this  model  through  a  predefined  lookup  table.  This  approach  has 
a  clear  physical  interpretation  since  lower  entropy  rate  means 
higher  dependence  and  the  probabilistic  site  model  coincides 
with  the  boundary  allocation  [30]. 

For  a  stationary  random  process  Zi>  Z2,  •  •  . ,  In>  the  entropy 
‘  rate  H(C)  is  defined  as  the  rate  of  entropy  growth  [25].  When 
dealing  with  a  stationary  Markov  chain,  the  entropy  rate  can  be 
calculated  by 

#(£)=  Jim  H{lN\lN-.u...,h) 

=  lim  Ff(/iv|ijv-i) 

IV— ►oo  '  7 

=  H{h\h).  (24) 

Assume  that  the  inhomogeneous  MRF  is  locally  stationary,  then 
the  local  entropy  rate,  i.e.,  the  entropy  rate  distribution,  is  given 
by 

H(£i)  =  YJP(ldi)H(k\ldi).  (25) 

ls< 


(19) 
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Since  the  probability  distribution  of  1^  is  generally  unavailable, 
we  use  first-order  stochastic  approximation  to  estimate  the  en¬ 
tropy  rate  [29],  [30] 

K 

H(Ci)*H{li\\ai)~-YJp{k  =  k\\di)  log p(k  = 

h—\ 

(26) 


By  opening  a  3  x  3  neighborhood  window,  the  conditional  prob¬ 
ability  is  given  by 

p{h  =  *M  =  X)  (27) 

j€di 


It  can  be  shown  that  when  1^  represents  a  uniform  block,  the 
corresponding  entropy  rate  is  zero.  Finally,  we  use  the  following 
lookup  table  to  assign  the  Markov  parameter  values 


H(Ci)  +  rj 


(28) 


where  a  is  the  scale  factor  and  rj  is  the  shifting  offset,  and  these 
coefficients  are  empirically  determined  based  on  our  experi¬ 
ence.  We  refer  to  the  plot  of  0;  as  the  patient-specific  proba¬ 
bilistic  site  model  or  the  site  model. 

Finally,  we  introduce,  in  addition  to  the  iterated  update  of 
context  images  1  using  (22)  and  subsequent  modification  of  the 
Markov  parameter  6  according  to  lookup  table  (28),  the  fine 
tuning  of  the  tissue  conditional  likelihood  densities  through  a 
“classification-based  learning”  [19].  The  classification-based 
learning  uses  the  misclassified  pixels  to  adjust  the  tissue 
density  functions,  which  are  previously  estimated  using  the 
EM  algorithm,  so  that  the  minimum  classification  error  can  be 
achieved.  Integrated  into  the  MICM  algorithm,  the  reinforced 
and  antireinforced  learning  rules  are  used  to  update  the  tissue 
quantities  /z*  and  in  the  first  term  of  (6)  at  each  iteration  as 
follows: 


Reinforced  learning 

(*+i)  __  M  >  (  __  in) 

^(n  +  l)  —  H'jin- M)  I  ^  1  A^(n+1) 


a 


2(n+l)  _  2 (n) 


(n+I)  ~  +  +  K 


(  (n)  \  2  2 (n 

~  ^(n+ 


) 

+1) 


Anti-reinforced  learning 

(n+l)  _  (n) 

i{n)  i(: 

2(n+l)  _  2 (n) 


.(n) 


(») 


—  (7 


x--u!in)'\  -a2{n) 
x%  4<n)  J  ai(n) 


(29) 


where  the  misclassified  pixels  are  those  with  ^n+1^  ^  and 
k  is  the  learning  rate  constant,  typically  chosen  small  (k  C  1). 
It  is  important  to  notice  that,  if  the  pixels  are  reclassified  into 
/jn+^th  tissue  type,  reinforced  learning  will  be  applied  to  pull 
the  corresponding  kernel  closer  to  the  host  region,  while  antire¬ 
inforced  learning  will  be  used  to  push  the  kernel  of  tissue 
type  away  from  the  problematic  region  [19].  The  misclassified 
pixels  associated  with  the  spatial  discontinuities  are  used  to  fine 
tune  the  discriminant  thresholds  in  the  decision  domain.  We 
refer  to  those  misclassified  pixels  as  the  boundary-associated 


(a)  (b) 

Fig.  I.  Simulated  image  with:  (a)  four  Gaussian  components  and  the  (b)  cor¬ 
responding  histogram  of  the  pixel  images  (right). 


pixels  since  significant  partial  volume  effect  most  often  occurs 
along  the  boundaries  [3J. 

IV.  Algorithm  and  Experiment 

Thus  far,  we  have  described  the  theory  behind  the  stochastic 
site  model  supported  image  analysis  scheme  with  tissue  quan¬ 
tification  and  segmentation.  We  now  complete  the  description 
of  our  algorithm  by  considering  the  major  steps  hi  the  imple¬ 
mentation  and  illustrating  its  performance.  Although  the  tissue 
quantification  and  image  segmentation  may  be  simultaneously 
performed  [3],  [5],  [6],  a  more  interesting  effort,  with  greater 
accuracy,  is  to  achieve  these  two  objectives  separately  [4],  [10], 
[23].  Guided  by  the  two  information  theoretic  criteria,  our  algo¬ 
rithm  progressively  proceeds  by  fitting  a  SFNM  to  the  histogram 
of  pixel  images  with  model  order  selection,  and  then  constructs 
a  consistent  relaxation  labeling  of  the  context  images.  A  sum¬ 
mary  of  the  major  steps  is  given  as  follows. 

Step  1)  For  each  value  of  if,  perform  ML  tissue  quantifica¬ 
tion  by  the  EM  algorithm  (12)-(  15)  with  30-50  iter¬ 
ations]. 

Step  2)  Scan  the  values  of  K  —  Km\n,  . . . ,  Kmax>  by  AIC 
(16)  and  MDL  (17)  to  determine  the  suitable  number 
of  tissue  types  K0. 

Step  3)  Select  the  result  of  tissue  quantification  corre¬ 
sponding  to  the  value  of  Kq  determined  in  step  2. 

Step  4)  Initialize  image  segmentation  by  ML  classification. 

Step  5)  Construct  the  probabilistic  site  model  or  site  model 
according  to  the  entropy  rate  distribution  (26),  (27) 
and  the  lookup  table  (28). 

Step  6)  Finalize  image  segmentation  by  MICM  algorithm 
by  implementing  relaxation  labeling  (22)  and  local 
parameter  tuning  (29)  with  5-10  iterations. 

The  performance  of  the  algorithm  is  evaluated  in  terms  of 
the  GRE  value  for  tissue  quantification,  visual  judgement,  and 
post-GRE  value  for  image  segmentation,  as  well  as  computa¬ 
tional  complexity  and  robustness  [4],  [22],  [31]. 

We  first  illustrate  the  application  of  our  algorithm  to  a  simple 
synthetic  data  set  generated  from  a  mixture  of  four  Gaussians  in 
a  two-dimensional  space,  shown  in  Fig.  1(a).  Each  component 
represents  one  tissue  type.  The  value  for  each  of  the  components 
is  set  to  a  constant  value  and  normally  distributed  noise  is  added 
to  the  simulation  phantom  resulting  in  6-dB  SNR.  Fig.  1(b) 


0,5/0.48  |  166/164  400/373 


4  1  0.125/0.15  [  206/201  |  400/463 


shows  the  corresponding  histogram  of  the  pixel  images  (noisy 
curve).  It  is  clear  that  these  four  simulated  tissue  types  are  highly 
overlapping.  We  then  apply  the  two  information  theoretic  cri¬ 
teria  (AIC  and  MDL)  to  detect  the  number  of  the  components. 
The  curves  of  the  AIC  and  MDL,  as  functions  of  the  number 
of  tissue  types  if,  are  plotted  in  Fig.  2(a)  and  (b),  respectively, 
where  the  plots  correspond  to  two  different  noise  levels.  The 
minima  of  these  curves  indicate  the  correct  number  of  image 
components.  It  is  clear  that  the  number  of  tissue  components 
suggested  by  both  criteria  is  correct.  The  results  of  the  distri¬ 
bution  learning  using  the  EM  algorithm  are  shown  in  Fig.  1(b) 
(smooth  curve).  With  the  correct  Kq,  the  GRE  between  the  esti¬ 
mated  SFNM  and  the  image  histogram  is  about  0.008  nats.  The 
numerical  results  are  given  in  Table  I. 

Having  determined  the  parameters  of  the  SFNM  model,  we 
conduct  image  segmentation  where  we  compare  the  perfor¬ 
mance  of  three  different  algorithms.  The  ML  pixel  classification 
is  applied  to  initialize  image  segmentation,  where  the  pixel  is 
classified  into  the  fcth  tissue  component  based  on  the  ML,  i.e., 


The  result  is  shown  in  Fig.  3(a).  The  corresponding  classifica¬ 
tion  error  is  about  30%  and  is  spatially  and  uniformly  distributed 
over  the  whole  image.  We  then  apply  the  ICM  algorithm  to  up¬ 
date  the  pixel  labeling  with  two  settings,  i.e.,  9  =  0.4  and  0  —  6. 
The  results  are  shown  in  Fig.  3(b)  and  (c),  respectively.  It  can 
be  concluded  that  since  the  ICM  algorithm  uses  a  homogeneous 
MRF  configuration,  a  smaller  9  will  lead  to  a  noisy  image  seg¬ 
mentation  with  higher  classification  error,  and  a  larger  9  will 
lead  to  a  smooth  context  recover  with  missing  details.  For  the 
given  simulated  image,  we  determined  9  =  1.5  as  the  best 
choice,  and  with  this  setting,  the  corresponding  classification 
error  is  down  to  0.7508%. 


(a)  (b) 

Fig.  4.  Contextual  image  segmentation  using  a  probabilistic  site  model 
[(a)  highlight  the  edges  of  regions]  (b)  supported  approach. 

Finally,  we  use  the  MICM  algorithm  together  with  the  Markov 
parameter  assignment  procedure  to  improve  image  segmenta¬ 
tion.  Based  on  (26)-(28),  our  intensive  numerical  experiments 
suggest  that  the  assignment  of  9  =  0.4  along  the  boundaries  and 
9  —  1. 5-6.0  within  the  regions,  results  in  good  performance  in 
a  typical  tone  image  [22],  [23].  Using  this  empirical  knowledge 
as  a  guideline,  we  select  a  =  0.5  and  r\  =  0.1  to  construct 
the  lookup  table.  Fig.  4(a)  gives  the  distribution  of  the  Markov 
parameter  values  for  the  simulated  image,  clearly  showing  the 
structure  of  the  boundaries.  The  result  using  the  MICM  algorithm 
is  shown  in  Fig.  4(b),  which  provides  an  almost  “perfect”  true 
context  in  which  the  image  details  are  nicely  preserved  while 
most  noise  effects  have  been  removed.  The  final  classification 
error  is  further  reduced  to  0.3 1 1 3%. 

As  an  example  of  a  more  complex  problem,  we  consider  a 
set  of  real  MR  brain  images  used  for  identification  of  different 
tissue  space  domains  [three-dimensional  (3-D)  spoiled  gradient 
recalled  acquisition  (SPGR)].  A  slice  from  such  a  data  set  is 
shown  in  I|g.  5(a).  This  is  a  T2-weighted  image  parallel  to  the 
ac-PC  line.  The  data  are  acquired  with  a  GE  Sigma  1.5  Tesla 
system.  The  imaging  parameters  are  TR  35,  TE  5,  flip  angle 
45°,  1.5-mm  effective  slice  thickness,  zero  gap,  and  124  slices 
with  in-plane  192  ^  256  matrix,  and  24-cm  field  of  view,  and 
one  number  of  excitation  (nex).  Since  the  skull,  scalp,  and  fat  in 
the  original  brain  images  do  not  contribute  to  the  brain  tissue,  we 
edit  the  MR  images  to  exclude  nonbrain  structures  prior  to  tissue 
quantification  and  segmentation.  A  close  look  of  the  whole  3-D 
data  set  has  indicated  that  the  histogram  has  a  considerably  dif¬ 
ferent  characteristics  from  slice  to  slice  and  the  tissue  types  are 
all  highly  overlapping  [4].  Our  experiment  is  to  assess  the  feasi¬ 
bility  of  the  algorithm  on  model  selection  and  probabilistic  site 
model  construction  with  real  MR  images.  After  the  initial  image 
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(a)  (b) 


Fig.  5.  (a)  Real  MR  brain  image  and  (b)  the  corresponding  probabilistic  site 

model,  where  the  key  feature  is  the  natural  discontinuities  in  the  image. 


>  5  10  t  5  •  9  10  II  12 

(a)  (b) 

Fig.  6.  MDL  curve  of  the  MR  image  in  Fig.  5.  (a)  Global  profile,  (b)  Local 
detail,  where  the  minimum  of  the  MDL  value  corresponds  to  Ko  =  10. 

segmentation  by  the  ML  pixel  classification,  the  corresponding 
site  model  is  given  in  Fig.  5(b).  Applying  both  AIC  and  MDL  to 
this  image,  we  obtain  Ko  =  10.  The  corresponding  MDL  curve 
is  shown  in  Fig.  6(a)  with  a  close-up  around  the  minimum,  as 
shown  in  Fig.  6(b).  Our  experience  has  shown  that  the  overall 
performance  of  these  two  information  theoretic  criteria  is  fairly 
consistent  with  real  MR  brain  images. 

As  a  final  example,  we  consider  tissue  quantification  and 
image  segmentation  of  a  normal  T1 -weighted  MR  brain  image, 
as  shown  in  Fig.  7(a).  As  discussed  in  the  literature  [2],  [3], 
the  brain  tissue  is  generally  composed  of  three  principal  types, 
i.e.,  white  matter  (WM),  gray  matter  (GM),  cerebrospinal  fluid 
(CSF),  and  their  pairwise  combinations.  Santago  and  Gage 
[3]  propose  a  six-tissue  model  to  represent  the  primary  tissue 
types  and  the  partial  volume  effect,  defined  as  CSF-white 
(CW),  CSF-gray  (CG),  and  gray-white  (GW).  In  our  imple¬ 
mentation,  we  also  consider  the  triple  mixture  tissue,  defined 
as  CSF-white-gray  (CWG).  More  importantly,  since  the  MR 
images  clearly  show  distinctive  intensities  at  local  brain  areas, 
the  functional  areas  within  a  tissue  type  need  to  be  considered. 
In  particular,  the  caudate  nucleus  and  putamen  are  two  im¬ 
portant  local  brain  functional  areas.  We  calculate  AIC (K)  and 
MDL(Af)for/f  =  2.  ...,  10.  The  results  with  these  two  criteria 
are  given  in  Table  II,  which  suggest  that  this  brain  image  contains 


Fig.  7.  (a)  T1 -weighted  “normal”  MR  brain  image  and  (b)  the  corresponding 
image  histogram. 


Fig.  8.  Result  of  image  segmentation  using  the  MICM  algorithm  and  local 
parameter  fine  tuning. 

eight  tissue  types.  The  histogram  of  the  pixel  images  is  shown 
in  Fig.  7(b)  (noisy  curve).  With  Ko  —  8,  the  result  of  tissue 
quantification  using  our  algorithm  is  given  by  Fig.  7(b)  (smooth 
curve)  with  a  GRE  value  of 0.02-0.04  nats.  The  actual  parameter 
values  corresponding  to  the  estimation  agree  with  those  of  a 
physician’s  qualitative  analysis  results  [4].  The  MICM-based 
tissue  segmentation  together  with  the  local  parameter  tuning 
(29)  is  performed  and  the  pixel  label  updates  are  terminated  after 
5-10  iterations  since  further  iterations  produced  almost  identical 
results.  The  segmentation  result  are  given  in  Fig.  8.  Although  the 
segmentation  result  contains  some  small  isolated  spots  (less  than 
four-pixel  size),  the  proposed  approach  is  quite  encouraging.  It 
is  seen  that  the  boundaries  of  WM,  GM,  and  CSF  are  delineated 
very  well  and  successfully.  The  regions  with  different  gray  levels 
are  satisfactorily  segmented,  especially  the  major  brain  tissues, 
which  are  clearly  identified  and  represent  the  following  eight 
brain  tissue  types: 

1)  CSF; 

2)  CG; 

3)  CGW; 

4)  GW; 

5)  GM; 

6)  putamen  area; 

7)  caudate  area; 

8)  WM. 
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TABLE  H 

Model  Selection  Using  AIC  and  MDL  Values 


K 

2 

3 

4 

5 

6 

7 

8 

9 

10 

AIC 

295944 

291667 

288845 

288397 

288380 

288184 

285685 

287674 

286317 

MDL 

295967 

291703 

288895 

288461 

h288458 

288275 

285790 

287792 

!  286449 

These  segmented  tissue  types  also  agree  with  the  results  of  radi¬ 
ologists’  evaluation. 

We  have  tested  our  algorithm  with  a  total  of  1 24  MR  brain  im¬ 
ages.  As  illustrated  here  by  both  simulated  and  real  examples, 
we  have  observed  that  the  overall  approach  can  be  very  effective 
at  quantifying  and  segmenting  different  tissue  types.  It  is  impor¬ 
tant  to  emphasize  that  in  unsupervised  image  analysis  with  real 
data  sets,  an  objective  evaluation  of  different  techniques  is  a  par¬ 
ticularly  difficult  task  [6],  [10],  [31].  It  is  difficult  to  quantify 
the  merit  of  a  particular  algorithm,  and  dependability  of  eval¬ 
uations  by  simple  mathematical  measures  is  questionable  [4]. 
Therefore,  most  of  the  time,  the  quality  of  tissue  quantification 
and  image  segmentation  depends  heavily  on  the  subjective  and 
qualitative  judgments.  In  this  paper,  besides  the  evaluation  per¬ 
formed  by  radiologists,  we  use  the  GRE  value  to  measure  the 
quality  of  tissue  quantification.  We  believe  that  this  criterion  is 
quite  reasonable  since  the  ML  estimation  is  unbiased  under  the 
SFNM  modeling  and  image  histogram,  as  the  reference  is  the 
only  subjective  quantity  directly  available  from  the  image  data 
[15],  [29].  Furthermore,  for  the  assessment  of  image  segmenta¬ 
tion,  we  use  both  the  post-GRE  value  and  the  visual  inspection, 
where  the  post-GRE  value  is  generated  by  the  sample  averages 
based  on  1 

* k  “  *) 
i=t 

1  N 

tyxi  (30) 

t—i  , 

yVVk  *0  {xi  -  /J'k) 

and  is  given  by 

i>(AIW  =  £/,(*)  log  AM  (31) 

as  an  indirect,  but  objective  criterion,  where  fi(x)  denotes  the 
SFNM  distribution  corresponding  to  the  segmented  image.  Con¬ 
sidering  the  probabilistic  pixel  label  z  and  the  realized  pixel 
label  1,  the  pre-segmentation  SFNM  fz  (e.g.,  f(x)  estimated 
via  a  soft  clustering  procedure  with  EM)  and  post-segmentation 
SFNM  /i  are  related  through  the  image  segmentation  obtained 
by  the  MICM  algorithm.  The  post-GRE  segmentation  criterion 
presented  here  states  that  the  relative  entropy  between  the  ML 
estimate  of  the  SFNM  (via  soft  clustering)  and  the  SFNM  ob¬ 
tained  from  the  segmented  image  (via  hard  clustering)  is  min¬ 
imal  if  the  image  components  are  correctly  segmented.  The  pa¬ 
rameter  values  of  a  particular  tissue  type  in  the  estimated  his¬ 
togram  are  most  likely  to  be  equal  to  the  parameter  values  of 
the  corresponding  tissue  type  in  the  segmented  region  if  the 


table  m 

Comparison  of  Segmentation  Error  Resulting  from  Different 
Clustering  Method 


Method 

ML-EM 

ICM 

MICM 

(soft) 

(fcarrf-homogeneous) 

(Ziani-inhomogeneous) 

GRE  value  (nats) 

0.0067 

0.4406 

0.1578 

pixel  images  are  properly  classified.  This  correspondence  is  lost 
in  the  case  of  misclassification.  Our  experiment  with  the  MR 
brain  image  has  led  to  a  result  of  D(fx\\fx)  =  0.1578  nats  and 
D(f4fz)  =  0.0067  nats,  which  coincides  with  what  is  sug¬ 
gested  by  the  information  theoretic  Pythagorean  theorem  [25, 
Th.  12.6.1],  D(f^\\f{)  >  D(/x||/z)  +  £>(/z||/i).  This  result 
implies  that  the  classification  error  in  image  segmentation  con¬ 
tributes  to  the  additional  quantification  error  D(fz Table  III 
presents  the  summary  of  the  results. 

V.  Summary 

In  this  paper,  we  have  presented  a  complete  procedure  for 
quantifying  and  segmenting  major  brain  tissue  types  from  MR 
images.  This  method,  as  illustrated  by  well-planned  simulation 
and  pilot  applications  in  brain-tissue  analysis,  is  capable  of  re¬ 
vealing  abnormalities  within  data  (e.g.,  allowing  K  to  be  ad¬ 
justable)  and  can  be  applied  to  clinical  problems  such  as  those 
encountered  in  tissue  segmentation  and  quantitative  diagnosis. 
It  is  important  to  emphasize  that,  although  the  experiments  here 
are  presented  with  two-dimensional  images,  their  application  to 
3-D  data  sets  is  straightforward  and  has  been  implemented  re¬ 
cently  [32]. 

The  main  limitations  of  the  current  approach  are  that:  1)  there 
is  no  systematic  way  to  initialize  the  model  parameter  values  and 
2)  the  algorithm  for  both  the  tissue  quantification  and  model 
selection  is  frequently  trapped  in  a  local  optima.  Thus,  future 
effort  should  focus  on  a  generic  site  model  supported  param¬ 
eter  initialization  and  a  multiresolution  or  hierarchical  scheme 
for  both  model  selection  and  tissue  quantification.  Additional 
work  includes  theoretical  investigation  of  the  new  formulation 
in  model  selection,  partial  volume  effect  by  image  blurring,  and 
the  objective  criterion  for  image-segmentation  evaluation. 
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Abstract.  This  paper  describes  a  neural  computation  based  non- 
rigid  registration  methodology  using  multiple  rigid  transforms,  in 
a  piece-wise  fashion,  to  model  the  registration  process  between  im¬ 
ages  in  a  sequence.  The  registration  methodology  is  a  hybrid  ap¬ 
proach  that  combines  registration  without  exact  point  correspon¬ 
dence  via  multi-object  principal  axes,  and  registration  with  point 
correspondence  via  polynomial  transform.  Neural  computation  is 
used  to  combine  the  derived  individual  principal  axes  solutions  for 
each  object  in  a  committee  machine  formulation  and  to  obtain  the 
polynomial  transform  based  on  extracted  control  points  using  a 
multi  layer  perceptrons  (MLP).  Three  examples  are  presented  to 
demonstrate  the  techniques  involved  in  the  process.  The  first  ex¬ 
ample  uses  four  Gaussian  clusters  and  focuses  on  the  combination 
of  the  multiple  transforms  into  a  composite  transform  using  finite 
mixture  modeling  techniques  [14].  The  next  examples  present  the 
complete  process  for  prostate  cancer  registration  and  breast  se¬ 
quence  analysis  respectively.  To  verify  performance,  the  results 
are  compared  to  non-neural  based  implementations  and  other  ex¬ 
isting  registration  methods. 


INTRODUCTION 

The  estimation  of  transformational  geometry  from  two  point  sets  is  an  es¬ 
sential  step  to  medical  image  fusion  and  computer  vision  [1,  2].  Medical 
diagnosis,  for  instance,  often  benefits  from  the  complementary  of  the  infor¬ 
mation  in  images  of  different  modalities.  The  task  is  to  recover  a  matrix 
representation  requiring  a  set  of  correspondence  matches  between  features  in 
the  two  coordinate  system  [3].  Assume  two  3-D  data  point  sets  {Pi>i }  an<^ 
{Pis};  i  =  1>  2, ...,  N  are  related  by 

PiB  =  Rpiyi  4-  T  +  Ni  (1) 
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where  R  is  a  rotation  matrix,  T  is  a  translation  vector,  and  N*  is  a  noise 
vector.  Given  {p^}  and  {pis},  Arun  tl  al  present  an  algorithm  for  finding 
the  least-squares  solution  of  R  and  T,  which  is  based  on  the  decoupling- 
translation  and  rotation  and  the  singular  value  decomposition  (SVD)  of  a 
3x3  cross-covariance  matrix  [3]. 

The  major  limitation  of  the  present  method  is  twofold:  (1)  while  feature 
matching  methods  can  give  quite  accurate  solutions,  obtaining  correct  cor¬ 
respondences  of  features  is  a  hard  problem,  especially  in  the  cases  of  images 
acquired  using  different  modalities  or  from  inter-subjects;  and  (2)  a  rigidity 
assumption  is  heuristically  imposed  while  those  newly  developed  deformable 
matching  methods  can  be  computationally  expensive  and  typically  need  good 
initial  guesses  to  assure  correct  convergence  [2,  4].  One  popular  method  that 
is  correspondenceless  is  the  principal  axes  registration  (PAR)  [1],  that  is  based 
on  the  relatively  stable  geometric  properties  of  image  features,  i.e.,  the  geo¬ 
metric  information  contained  in  these  stable  image  features  is  often  sufficient 
to  determine  the  transformation  between  images  [2],  Once  again,  it  cannot 
handle  the  cases  with  non-rigid  objects. 

In  this  paper,  we  present  a  neural  computation  based  non-rigid  regis¬ 
tration  using  piece-wise  rigid  transformation.  The  novel  feature  is  to  align 
two  point  sets  without  needing  to  establish  explicit  point  correspondences, 
where  the  derivation  is  realized  by  minimizing  the  relative  entropy  between 
the  two  point' distributions  resulting  in  a  maximum  likelihood  estimate  of  the 
transformation  matrix.  A  committee  machine  approach  is  used  for  recover¬ 
ing  the  transformational  geometry  of  the  non-rigid  structures.  That  is  rather 
than  using  a  single  transformation  matrix  which  gives  rise  to  a  large  registra¬ 
tion  error,  we  attempt  to  interpolatively  apply  a  mixture  of  transformations. 
By  further  generalizing  PAR  to'  a  finite  mixture  registration  (FMR)  scheme, 
with  a  soft  partitioning  of  the  data  set,  the  mixture  is  fit  using  expectation- 
maximization  (EM)  algorithm.  We  then  applied  a  probabilistic  adaptive 
principal  components  extraction  (PAPEX)  algorithm  (10),  to  estimate  the 
transformational  of  the  orthogonal  set  of  eigenvalues  and  eigenvectors  of  the 
auto- covariance  matrix.  By  applying  a  committee  machine  to  a  non-rigid 
registration,  using  FMR  as  the  experts  and  PAPEX  as  a  gating  function,  we 
can  acquire  the  registration  based  on  a  mixture  of  piece-wise  transformations 
of  the  data  set.  Then  the  correspondences  control  points  are  obtained.  As 
a  final  step,  the  warped  image  is  obtaining  using  the  neural  network  based 
non-linear  mapping,  to  obtain  the  polynomial  transform  based  on  extracted 
control  points  using  MLP. 


THEORY  AND  METHOD 

Suggested  by  information  theory  [5],  we  note  that,  since  the  control  point  sets 
in  two  images  can  be  considered  as  two  separate  realizations  of  the  same  ran¬ 
dom  source,  we  do  not  need  to  establish  point  correspondences  to  extract  the 
transformation  matrix.  In  other  words,  if  we  denote  by  P{p%}  the  distribution 
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of  the  control  point  set  in  an  image,  we  have  the  simple  relationship 

P(pja}  =p{Rp^+T) 

where  e  is  the  noise  component.  Since  the  probability  distributions  can  be 
computed  independently  on  each  image  without  any  need  to  establish  feature 
correspondences,  and  given  the  two  distributions  of  the  control  point  sets  in 
the  two  images,  we  can  recover  the  transformation  matrix  in  a  simple  fashion 
[2],  as  we  now  sketch. 

For  observation  of  the  distributions,  we  can  estimate  R  and  T  by  mini¬ 
mizing  the  relative  entropy  (Kullback-Leibler  distance)  between  R{Pjfl}  and 
P{ Rp.^+T}*  Tiie  least  relative  entropy  estimator  is  then  defined  as 

axgmmD(P{PjB}||P{Rp./t+T})  (3) 

where  D  denotes  the  relative  entropy  measure.  Following  the  same  strategy 
to  decoupling  translation  and  rotation  as  in  [3],  we  can  define  a  new  data 
point  by  q,yi  —  pt/4  —  p°  and  =  PjB  —  Ps>  where  p^  and  p%  are  the 
centroids  of  {pa}  and  {pjB}  respectively.  Then  the  optimal  geometric 
transformations,  R  and  T,  are  defined  as 

R  =  UbHU^  and  T  =  p#  -  Rp^  (4) 

where  the  superscript  t  denotes  matrix  transposition,  V  a  and  Ufi  are  3  x  3 
orthonormal  matrices,  and  H  is  a  3  x  3  diagonal  matrix  with  element  hm  = 
yJ\mB 7 Am A -  Note  that  the  transformation  U  consists  of  the  orthonormal  set 
of  eigenvectors  and  hm  is  the  squared  root  of  the  eigenvalues  Am  of  the  auto¬ 
covariance  matrix  C  for  m  —  1,2,3  and  for  {pm}  and  {P;b},  respectively. 

However,  because  of  its  global  linearity,  the  application  of  PAR  is  neces¬ 
sarily  somewhat  limited  [6].  An  alternative  paradigm  is  to  model  a  multi¬ 
modal  control  point  set  with  a  collection  of  local  linear  models.  The  method 
is  a  two-stage  procedure:  a  soft  partitioning  of  the  data  set  followed  by  es¬ 
timation  of  the  principal  axes  within  each  partition  [7].  Recently  there  has 
been  considerable  success  in  using  standard  finite  normal  mixture  (SFNM) 
to  model  the  distribution  of  a  multimodal  data  set  and  the  association  of  a 
SFNM  distribution  with  PAR  offers  the  possibility  of  being  able  to  register 
two  images  through  a  mixture  of  probabilistic  principal  axes  transformations 
[7],  Assume  that  there  are  Kq  control  point  clusters,  where  each  control  point 
cluster  defines  a  transformation  {Rjb,Tfc}.  Thus  for  a  point  p nA,  new 

cations  corresponding  to  each  of  the  transformations  are  pnfc  =  RkPnA  +  T \ 
for  k  =  1, ...,  Kq.  Further  assume  that  the  control  point  set  defines  a  SFNM 
distribution 

Kq 

/(P«)  =  (5) 

k=l 

where  g  is  the  Gaussian  kernel  with  mean  vector  and  auto- covariance 
matrix  Cfc,  and  7rk  is  the  mixing  factor  proportional  to  the  number  of  control 
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points  in  cluster  k.  For  each  of  the  control  point  sets  {piy4}  and  {p;s},  the 
mixture  is  fit  using  the  expectation-maximization  (EM)  algorithm.  The  E 
step  involves  assigning  to  the  linear  models  responsibilities  from  the  control 
points;  the  M  step  involves  re-estimating  the  parameters  of  the  linear  models 
in  the  light  of  this  assignment  [7). 

Thus  the  statistical  membership  of  point  pnA  belonging  to  each  of  the 
control  point  clusters  can  be  derived  by 

znk  =  P(Rk, Tk\PnA)  =  (6) 

/(P«/0 

i.e.,  the  posterior  probabihty  of  {Rfc,Tfc}  given  pnA.  Thus,  we  can  define 
the  FMR  transformation  as 

p»  =  E  *****  =  E  ^Mg(pyJMY’Cfc,4)(  RkPnA + T*)  (7) 

1  1  JKPnA) 

where  {R^T*}  is  determined  based  on  /(p.)  =  ^kg{Pi\fik,  Ck)  that 
we  have  estimated  in  the  previous  step  using  the  EM  algorithm.  Note  that 
now  we  do  need  the  correspondences  between  the  two  control  point  clus¬ 
ters,  and  these  correspondences  may  be  found,  after  a  global  PAR  is  initially 
performed,  by  using  a  site  model  supported  approach  or  a  dual-step  EM  al¬ 
gorithm  to  unify  the  tasks  of  estimating  transformation  geometry  and  iden¬ 
tifying  cluster-correspondence  matches  [4],  This  philosophy  for  recovering 
transformational  geometry  of  the  non-rigid  objects  is  similar  in  spirit  to  the 
divide-and- conquer  principle  [6],  under  which  the  relative  entropy  between 
the  two  point  sets  reaches  its  minimum 

ans  min  (8) 

both  globally  and  locally. 

Based  on  a  mixture  of  probabilistic  principal  axes  transformations,  the 
next  section  describes  a  neural  computation  using  a  committee  machine  ap¬ 
proach  for  which  a  complex  computational  task  is  solved  by  dividing  it  into 
a  number  of  computationally  simple  tasks  and  then  combining  the  solutions 
to  those  tasks.  Basically,  it  fuses  knowledge  acquired  by  experts  to  arrive  at 
an  overall  decision  that  is  supposedly  superior  to  that  attainable  by  any  one 
of  them  acting  alone. 


NEURAL  COMPUTATION 

A  neural  network  interpretation  of  the  EM  algorithm  is  given  in  [8].  Be¬ 
cause  of  its  reputation  of  being  slow  in  which  new  information  acquired  in 
the  expectation  step  is  not  used  immediately,  on-line  versions  of  the  EM  al¬ 
gorithm  are  proposed  for  large-scale  sequential  learning.  Thus,  we  adopt 
a  fully  unsupervised  and  incremental  stochastic  learning  algorithm.  The 
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scheme  provides  winner- takes-in  probability  (Bayesian  “soft”)  splits  of  the 
control  points,  hence  allowing  the  data  to  contribute  simultaneously  to  mul¬ 
tiple  clusters  which  results  in 


E  —  Step 


2(t+i)fc 


(9) 


M  -  Step 


Hk +1)  =  +  o(*)( Pi+l  -  Vk])z(i+l)k, 

:(i+D  =  cw  +  6(i)[(Pi+1  -  Mfc})(Pi+i  -  HhV  -  Cfl^i+Dfc, 


Ji+D  _  _J_ 


■iri'  +  - 


'z(.i+l)k 


for  k  =  1, ....  i<"0  and  for  {pi^}  and  {p,b},  respectively,  where  a(i)  and  b(i) 
are  introduced  as  the  learning  rates,  two  sequences  converging  to  zero,  ensur- 
ing  unbiased  estimates  after  convergence.  This  procedure  is  termed  as  neural 
computation  of  the  EM  algorithm,  where  at  each  complete  cycle  of  the  algo¬ 
rithm,  we  first  use  “old”  set  of  parameter  values  to  determine  the  posterior 
probabilities  Z(i+X)k.  These  posterior  probabilities  are  then  used  to  obtain 

“new”  values  7r£+1\  /x£1+l\  The  algorithm  cycles  back  and  forth  un¬ 

til  the  value  of  relative  entropy  between  the  data  histogram  and  mixture 


model  reaches  its  minimum 


arg  min  >  I  |/(Pi )) 


(13) 


for  {pt.4 }  and  {pie}>  respectively. 

With  a  soft  partitioning  of  the  data  set  using  Eqs.  (9-12),  control  points 
will  now  effectively  belong  to  more  than  one  cluster  spatially.  Thus,  the  ef¬ 
fective  input  values  are  p^  =  Z{k(pi  —  Mfc)  f°r  an  independent  registration 
transformation  k  in  the  committee  machine  [9].  We  then  extend  our  adaptive 
principal  components  extraction  (APEX)  algorithm  to  a  probabilistic  ver¬ 
sion,  i.e.,  PAPEX  [10],  to  determine  Uk  for  {piA}  and  {pta},  respectively, 
summarized  as  follows. 


1.  Initialize  the  feedforward  weight  vector  umfc  for  m  =  1,2,3,  and  the 
feedback  weight  vector  amk  to  small  random  values  for  m  =  2, 3,  at 
time  i  =  l.  Assign  a  small  positive  value  to  the  learning  rate  parameter 

77* 


2.  Set  m  =  1,  and  for  i  =  1, 2, compute 

yifc(t)  =  Uu(i)zifc(Pi  ~  Mfc)  (14) 

Ui fc(i  +  1)  =  uifc(i)  +  r][yik{i)zik(Pi  ~  Pk)  ~  !/ifc(*)uifcW] 

For  large  i  we  have  uut(f)  — *  nut,  where  uxk  is  the  eigenvector 
associated  with  the  largest  eigenvalue  of  the  cluster  fc,  and  Aifc  = 

£E  LiviM 
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3.  Set  to  =  2,  and  for  i  =  1, 2, ...,  compute 

y(m~i)fc(0  =  [yiJb(0)y2JbW»-iy(m-i)fc(*)]T  (16) 

l/mfc(i)  =  uLfc(Ozifc(P<  -  Mfc)  +  a^fc(t)y(?n«i)fc(t)  (17) 

am/c  (l  +  1)  =  Umfc(z)  4"  ^[ymA:  (^)^tfc(pi  Mfc)  V-mk  (^)umfc  (01  (1$) 

am/c(^  4*  1)  —  amA: (^)  l)fc(0  4”  (^)l  (^) 

For  large  i  we  have  U2fc(i)  — *  U2*,  where  112*  is  the  eigenvector 
associated  with  the  second  largest  eigenvalue  of  the  cluster  /c,  and 

A2j k  =  *EHi&(i). 

4.  Set  to  =  3,  go  to  step  3.  For  large  i  we  have  U3 ^  (z)  — *  113*.,  where 
U3fc  is  the  eigenvector  associated  with  the  third  largest  eigenvalue  of 
the  cluster  k,  and  X3ic  =  jj  ylk{i). 

The  neural  computational  of  a  committee  machine  can  be  achieved  by 
distributing  the  learning  tasks  among  a  number  of  experts,  which  in  turn 
partitioning  the  input  space  into  a  set  of  subspaces.  The  experts  are  in 
theory  performing  supervised  learning  in  that  there  individual  outputs  are 
combined  to  model  the  desired  response.  There  is,  however,  a  sense  in  which 
the  experts  are  also  performing  self-organized  learning;  that  is,  they  self 
organize  to  find  a  good  partitioning  of  the  input  space  so  that  each  expert 
does  well  at  modeling  its  own  subspace,  and  as  a  whole  group  they  model 
the  input  space  well. 

Consider  the  modular  network  nature  of  Eq.  (7),  it  is  a  mixture  of  local 
expert  model,  in  which  the  individual  responses  of  the  experts  are  nonlinearly 
combined  by  means  of  a  single  gating  network.  It  is  assume- here  that  the 
different  experts  work  best  in  different  regions  of  the  input  space  in  accor¬ 
dance  with  the  probabilistic  generative  model.  In  our  case,  the  effective  input 
values  are  =  ^(p*  —  fJ>k)  f°r  an  independent  registration  transformation 
k  in  the  networks.  A  committee  machine  consists  of  k  supervised  modules, 
with  a  soft  partitioning  of  the  data  set  using  EM  algorithm  called  experts, 
and  an  integrating  unit  called  a  gating  network  that  performs  the  function 
PAPEX  to  determine  the  transformation  of  the  orthogonal  set  of  eigenvalues 
and  eigenvectors  of  the  auto-covariance  matrix  among  the  expert  networks. 
The  output  of  our  committee  machine  is  a  transformational  matrix  of  image 
pair.  Based  on  those  correspondences  between  the  two  images,  the  control 
points  are  obtained.  The  final  step  is  to  calculate  the  polynomial  transfor¬ 
mation  using  piece-wise  interpolation.  In  our  case  for  a  given  corresponding 
control  points  pair  of  the  image,  we  considered  neural  network  based  MLP  to 
acquired  the  polynomial  transform.  The  final  result  shows  the  warped  image 
that  has  been  corrected  by  applied  our  neural  computation  to  correct  most 
of  the  scale  different  between  the  images. 
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Figure  1:  A  mixture  Gaussian  data  set  (left:  Gaussian  model;  middle:  FMR  regis¬ 
tration  result;  right:  inverse  FMR  result) 

RESULTS  AND  DISCUSSIONS 

To  evaluate  the  effectiveness  of  our  algorithm,  three  simulations  were  per¬ 
formed.  The  first  simulation  explored  the  performance  of  FMR  under  con¬ 
trolled  conditions  while  the  other  two  simulations  considered  the  complete 
processing  chain  applied  to  real  world  problems.  Next,  each  of  these  simu¬ 
lations  is  discussed. 

The  first  simulation  is  a  mixture  of  600  data  points  generated  from  four 
Gaussian  clusters  in  3-D  space.  Figure  1  (left)  is  the  original  image;  the 
global  view  of  the  data  clearly  suggests  the  presence  of  four  distinct  clusters 
within  the  data.  We  considered  three  of  the  four  Gaussian  clusters  as  the 
control  objects'  and  the  remaining  object  as  a  noncontrol  object.  Each  of 
the  clusters  were  translated  and  rotated  by  different  amounts  to  simulate  a 
non-global  transform.  With  a  mixture  of  all  the  three  local  transforms,  we 
acquired  a  non-rigid  transform  of  a  non-control  object  based  on  a  committee 
machine  approach  as  shown  in  Figure  1  (middle).  To  show  the  robustness  of 
our  propose  method,  we  applied  our  algorithm  to  the  transformed  image  as 
an  inverse  process  resulting  in  Figure  1  (right).  The  performance  metric  is 
mean  square  error  (MSE)  of  the  non-control  object.  Several  runs  of  this 
example  were  conducted  to  average  out  randomness  of  the  data  points.  The 
results  show  a  MSE  of  7.6%  between  the  final  image  and  original.  Thus, 
showing  the  usefulness  of  this  approach. 

Next,  we  extended  our  algorithm  to  a  3-D  prostate  registration  problem. 
Based  on  our  previous  work  [14],  this  3-D  model  contains  a  precise  proba¬ 
bilistic  map  of  prostate  tumor  distribution  and  the  corresponding  anatomic 
structure  of  a  prostate.  The  goal  of  this  example  is  to  register  the  tumor 
inside  a  prostate,  with  a  reference  model  using  prostate  anatomic  structure  as 
a  control  objects.  We  denote  the  reference  model  as  shown  in  Figure  2(left) 
and  the  float  model  with  non-control  object  as  shown  in  Figure  2(middle), 
We  find  object  to  object  corresponding  and  let  tumor  in  the  float  model  be 
a  non-control  object.  After  we  applied  our  hybrid  committee  machine  algo¬ 
rithm  to  the  reference  and  the  float  model,  based  on  piece- wise  registration, 
each  control  points  are  effectively  belong  to  more  than  one  cluster  hence  al¬ 
lowing  the  data  to  contribute  simultaneously  to  multiple  clusters.  Figure 
2(right)  shows  the  result  using  neural  computation  to  register  tumor  in  a 
float  model  to  a  reference  model. 
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Figure  2:  Prostate  Model  used  in  registration  (left:  reference  model;  middle:  float 
image;  right:  registration  cancer  class  of  float  image  to  reference  image  using  FMR 
method) 

As  a  final  example,  we  applied  our  algorithm  to  a  temporal  sequence  of 
mammograms  of  a  single  patient  [15].  In  this  example,  the  committee  ma¬ 
chine  is  used  to  obtain  an  initial  registration  using  multiple  extracted  objects 
(skinline,  dense  tissue  regions)  in  a  finite  mixture  scheme.  Then  MLP  was 
used  to  determine  the  coefficients  of  a  polynomial  transform  using  extracted 
vertical  and  horizontal  cross  points  of  elongated  structures  as  control  points. 
Previously,  thin-plate  spline  (TPS)  was  used  to  determine  the  transform  coef¬ 
ficients  [13].  As  a  comparison,  we  consider  both  results  here.  Figure  3  shows 
the  raw  sequence  and  Figure  4  shows  the  resulting  warped  image  for  MLP 
(left)  and  TPS  (right).  From  visual  inspection  we  see  that  by  using  MLP 
the  most  of  the  scale .  different  between  images  has  been  corrected.  While 
the  TPS  distorts  the  image,  this  distortion  can  be  attributed  to  control  point 
selection  and  correspondence.  The  MLP  better  adapts  to  the  error  present 
in  the  control  points  thus  yielding  a  smoother  result. 


CONCLUSION 

In  this  paper,  we  presented  the  theoretical  concepts  and  methods  of  a  neu¬ 
ral  computation  based  non-rigid  registration  algorithm.  The  approach  uses 


Figure  3:  A  pair  of  real  mammograms  taken  over  a  period  of  time  from  the  same 
patient. 
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Figure  4:  The  result  of  warped  current  image  to  previous  image  (left:  MLP;  right: 
TPS). 

a  committee  machine  to  recover  the  total  transformational  geometry  of  the 
non-rigid  object  using  multiple  rigid  transforms  combined  together  in  a  fi¬ 
nite  mixture  registration  scheme.  Finite  mixture  transform  combination  is  a 
novel  technique  to  combine  multiple  transforms  that  are  contained  in  a  sin¬ 
gle  image.  Other  than  local  transforms  no  other  method  combines  multiple 
transforms  together.  In  addition  finite  mixture  combinations,  yields  a  lower 
MSE  than  the  local  transform  and  results  in  a  smooth  image  while  local 
transforms  yield  an  image  containing  discontinues  on  transform  boundaries. 
In  addition,  the  registration  obtained  in  the  committee  machine  is  fine  tuned 
using  a  non-linear  transform  generated  by  a  MLP  network  using  extracted 
control  points. 

We  applied  this  algorithm  to  prostrate  and  breast  registration  problems 
with  reasonable  results  as  shown  in  the  previous  figures.  Some  distortion 
can  be  seen  in  the  final  warped  images  because  of  the  error  in  control  point 
selection  and  correspondence.  Improvement  is  this  portion  should  decrease 
distortion  and  yield  a  smoother  looking  image.  Using  neural  networks  in 
this  problem  has  increased  the  generality  of  this  approach  by  allowing  the 
algorithm  to  adjust  performance  as  imaging  condition  change. 
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Abstract 


In  the  clinical  course  of  detecting  masses,  mammographers  usually  evaluate  the  surrounding  background 
of  a  radiodense  when  breast  cancer  is  suspected.  In  this  study,  we  adapted  this  fundamental  concept  and 
computed  features  of  the  suspicious  region  in  radial  sections.  These  features  were  then  arranged  by  circular 
convolution  processes  within  a  neural  network,  which  led  to  an  improvement  in  detecting  mammographic 
masses. 

In  this  study,  randomly  selected  mammograms  were  processed  by  morphological  enhancement 
techniques.  Radiodense  areas  were  isolated  and  were  delineated  using  a  region  growing  algorithm  with  a  valley 
blocking  technique.  The  boundary  of  each  region  of  interest  was  then  divided  into  36  sectors  using  36  equi- 
angle  dividers  radiated  from  the  center  of  the  area.  Four  features  at  each  section  were  computed:  (1)  the  radius, 
(2)  the  normal  angle  of  the  boundary,  (3)  the  average  gradient  along  the  radial  direction,  and  (4)  the  gray  value 
difference  (i.e.,  contrast)  along  the  radial  direction.  Hence,  144  computed  features  (i.e.,  4  features  per  sector  for 
36  sectors)  were  used  as  input  values  for  the  newly  invented  multiple  circular  path  neural  network  (MCPNN). 
The  neural  network  is  constructed  to  emphasize  on  the  correlation  information  associated  with  the  feature 
interactions  within  the  angle  and  between  adjacent  angles. 

We  have  tested  this  approach  on  our  research  database  consisting  of  91  mammograms.  The  over-all 
performance  in  the  detection  of  masses  was  0.78-0.80  for  the  areas  (Az)  under  the  ROC  curves  using  the 
conventional  neural  network.  However,  the  performance  was  improved  to  Az  values  of  0.84-0.89  using  the 
multiple  circular  path  neural  network. 
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1.  Introduction 

It  is  well  known  that  effective  treatment  of  breast  cancer  calls  for  early  detection  of  cancerous  lesions 
(e.g.,  clustered  microcalcifications  and  masses  associated  with  malignant  cellular  processes)  [1-3].  Breast 
masses  appear  as  areas  of  increased  density  on  mammograms.  It  is  particularly  difficult  for  radiologists  to 
detect  and  analyze  a  suspected  area  where  a  mass  is  overlapped  with  dense  breast  tissue.  These  masses  are 
more  readily  seen  as  time  progresses,  but  the  further  the  tumor  has  progressed,  the  lower  the  possibility  of  a 
successful  treatment.  Therefore,  increasing  the  chances  of  early  breast  cancer  detection  in  improving  today's 
clinical  system  is  of  vital  importance  to  breast  cancer  patients. 

Several  research  groups  have  developed  computer  algorithms  for  automated  detection  of  mammographic 
masses  [4-8].  Investigators  also  attempted  to  classify  the  malignant  or  benign  nature  of  the  detected  tumors  [5]. 
The  results  of  these  detection  programs  indicate  that  a  high  true-positive  (TP)  rate  can  be  obtained  at  the 
expense  of  2  or  3  false-positive  (FP)  detections  per  mammogram.  This  FP  rate  is  unacceptably  high  for  mass 
detection  in  clinical  practice.  Mammographically,  a  multiplicity  (more  than  two)  of  similar  benign-appearing 
breast  lesions  argues  strongly  for  benignity  [9-12]  and,  indeed,  the  more  masses  that  are  identified,  the  less 
chance  that  they  represent  cancer  [12].  If  the  computer  indicates  multiple  detections  on  each  mammogram,  the 
radiologist  has  to  seek  out  the  one  mass  that  has  mammographic  features  that  differ  from  the  others.  The 
significant  lesion  may  be  missed  due  to  the  multiplicity  of  possible  lesions.  We  therefore  believe  that  a  more 
useful  and  fundamental  approach  to  computer-aided  diagnosis  (CADx)  of  masses  is  to  devise  computer 
programs  to  analyze  features  of  a  suspected  mass  [13,14],  which  are  detected  by  the  radiologist,  and  provide 
feature  measures  and  estimates  of  the  likelihood  of  malignancy  by  comparing  the  computer's  database.  The 
computer  therefore  serves  as  a  second  opinion  and  also  provides  a  reproducible  and  objective  evaluation  of  the 
mass.  With  this  aid,  the  radiologist  may  also  increase  his/her  sensitivity  by  lowering  the  threshold  of  suspicion, 
while  maintaining  the  overall  specificity  and  reading  efficiency. 


2.  Clinical  Background  of  Breast  Lesions  and  Technical  Approach  in  Mass  Detection 
2.1  Description  of  clinical  background  , 

Most  commonly,  breast  cancer  presents  as  a  mass.  The  same  lesion  shows  a  somewhat  different  picture 
from  one  projection  to  the  other.  Difficulties  in  mass  detection  also  vary  with  the  underlying  breast 
parenchyma.  In  the  fatty  breast,  masses  are  generally  easy  to  detect.  With  the  dense  breast,  mass  detection  is 
more  difficult  and  auxiliary  signs  aid  this  detection.  Breasts  can  contain  one,  several,  of  many  masses.  When 
there  is  one  mass,  the  decision  process  is  based  on  its  size,  shape,  and  margins.  The  larger  the  mass  is  and  the 
less  well-defined  its  margins,  the  greater  the  chance  of  cancer.  When  there  are  several  masses,  one  looks  at 
each,  trying  to  determine  whether  any  has  features  to  suggest  cancer  (poorly  defined,  spiculate,  unusually 
radiodense  for  size)  and  one  also  looks  to  see  whether  any  mass  is  different  in  appearance  from  the  others. 
Multiple  small,  well-defined,  similar  masses  presenting  bilaterally  are  all  likely  to  be  benign.  The  greater  the 
asymmetry,  size,  lack  of  circularity,  edge  unsharpness,  and  radiodensity,  the  more  suspicious.  In  this  study,  we 
used  several  computational  features  (see  section  3.2)  highly  associated  with  four  major  features  of  breast  masses 
routinely  used  in  clinical  reading: 

Density  -  Malignant  lesions  tend  to  have  greater  radiographic  density  due  to  high  attenuation  and  less 
compressibility  of  cancer  than  normal  tissue.  Radiolucent  lesions  are  typically  benign  and  the  diagnosis 
can  be  made  from  the  mammogram. 

Size  -  If  the  lesion  has  morphological  features  suggesting  malignancy,  it  should  be  considered  suspicious 
regardless  of  the  size.  Isolated  masses  with  non-cystic  densities  greater  than  8  mm  in  diameter  can  be 
malignant.  In  general,  the  larger  a  lesion,  the  more  suspicious  it  is. 

Shape  -  The  more  irregular  the  shape  of  a  lesion,  the  more  likely  the  possibility  of  malignancy.  Lesions  tend  to 
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be  round,  ovoid  and/or  lobulated.  Small  and  frequent  lobulations  are  suspicious.  Lesions  in  the  lateral 
aspect  of  the  breast  near  the  edge  of  the  parenchyma  with  a  reniform  shape  and  a  hilar  indentation  or 
notch  usually  represent  a  benign  intramammary  lymph  node.  Breast  carcinoma  hidden  in  the  dense 
tissues  can  cause  parenchymal  retraction,  which  possess  different  shapes. 

Margins  -The  margins  of  the  lesion  should  be  carefully  evaluated  for  areas  of  spiculation,  stellate  patterns  or  ill- 
defined  regions.  Most  breast  cancers  have  ill-defined  margins  secondary  to  tumor  infiltration  and 
associated  fibrosis.  The  appearance  of  spiculations  and  a  more  diffuse  stellate  pattern  are  almost 
pathognomonic  for  cancer.  Lesions  with  sharply  defined  margins  have  a  high  likelihood  of  being 
benign;  however,  up  to  7%  of  malignant  lesions  can  be  well  circumscribed. 

These  are  known  clinical  features  and  have  been  adapted  in  “Breast  Imaging  -  Reporting  and  Data  System” 
(BI-RAD)  [15]  of  the  American  College  of  Radiology  (ACR). 


2.2.  Technical  approach  for  detection  of  mammographic  masses 

In  this  study,  our  goal  was  to  extract  clinically  suspicious  lesions.  The  differentiation  of  benign  and 
malignant  status  was  beyond  the  scope  of  this  work.  Hence,  we  will  only  provide  methods  in  extracting 
potential  lesions  from  glandular  tissue  in  the  following  sections.  (Note  that  lesions  can  be  overlapped  with 
dense  breast  parenchyma.)  The  study  was  conducted  with  the  following  steps:  (1)  use  background  correction 
method  and  morphological  operations  to  extract  radio-opaque  areas,  (2)  delineate  the  boundary  of  the  areas,  (3) 
compute  the  features  and  texture  of  the  masses  with  emphasis  on  the  boundary,  and  (4)  design  training  strategy 
using  neural  networks  as  classifiers  for  the  recognition  of  mass  features.  An  overall  detection  scheme  of  the 
study  framework  is  shown  in  Figure  1 . 


Figure  1 .  A  system  flowchart  for  the  detection  of  masses  in  this  study. 


3.  Development  of  Technical  Methods 

3.1.  Preprocessing  for  image  consistency  and  mass  enhancement  using  morphological  operations 

One  of  the  main  difficulties  in  automatic  mass-detection  is  that  mammographic  masses  are  often 
overlapped  with  breast  tissues.  In  such  cases,  it  is  necessary  to  remove  bright  background  caused  by  breast 
tissues  but  to  keep  mass-signals.  For  this  purpose,  background  correction  is  an  indispensable  technique  for  mass 
detection. 

The  theory  of  mathematical  morphology  is  powerful  in  analyzing  and  describing  geometrical  relations. 
Essentially  it  is  a  formalization  of  intuitive  concepts  such  as  size  or  shape.  The  two  basic  morphological 
operations  are  “erosion”  and  “dilation,”  which  are  consistently  defined  for  binary  and  gray-scale  images.  Using 
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these  two  basic  operations,  two  other  basic  and  important  operators,  “opening”  and  “closing”,  can  be  defined  as 
follows: 


opening:  Xb=(XQB)®B,  ...(1) 

closing:  B)Q  B,  ...(2) 


where  X  indicates  the  original  image,  B  represents  the  structuring  element,  and  ©  and  9  indicate  the  operations 
“dilation”  and  “erosion,”  respectively.  Based  on  the  “opening”  operation,  we  have  developed  an  operation  for 
background  correction.  The  operation  is  represented  by 


X-XB=X-(XQ  B)®B. 


...(3) 


This  equation  represents  the  subtraction  of  the  image  processed  by  the  operator  “opening”  from  the  original 
image. 

Figure  2  shows  the  effect  of  the  operation  represented  by  equation  (3):  (A)  illustrates  a  structuring 
element,  (B)  shows  the  original  signal  (gray  line)  and  the  processed  signal  (black  line)  by  “opening”,  and  (C) 
denotes  the  final  output  signal  of  the  morphological  operation.  The  final  profile  in  (C)  was  obtained  by 
subtracting  the  black  profile  signal  from  the  gray  profile  signals  in  (B).  Note  that  the  detected  peak  signals  were 
not  affected  by  the  operation.  Hence  the  mass  signals  detected  by  the  operation  retain  their  original  shapes. 

As  can  be  seen  in  this  graph,  the  size  of  the  detected  peak  significantly  depends  on  the  size  of  the 
structuring  element.  All  peaks,  which  are  smaller  than  the  structuring  element,  can  be  detected.  In  our  mass 
detection  process,  a  52  pixel-diameter  structuring  element  will  be  used  to  detect  masses  whose  sizes  are  less  than 
52  pixels  in  diameter.  An  object  with  a  diameter  of  52  pixels  in  a  512x625  pixel  reduced  image  occupies  250 
pixels  in  its  original  digitized  image,  and  its  real  size  is  expected  to  be  about  2.5  cm. 


Figure  2.  An  example  of  the  morphological  operation:  (A)  structuring  element,  (B)  original  signal  (gray  line) 
and  signal  after  opening  (black  line),  and  (C)  output  signal  of  the  morphological  filtering. 


3.2.  Feature  extraction  of  masses 

Feature  extraction  methods  have  played  essential  roles  in  many  pattern  recognition  tasks.  Once  the 
features  associated  with  an  image  pattern  are  extracted  accurately,  they  can  be  used  to  distinguish  one  class  of 
patterns  from  the  others.  Recently,  many  investigators  have  found  that  the  multilayer  perceptron  neural 
network  using  the  error  back  propagation  training  technique  is  a  very  powerful  tool  to  serve  as  an  analyzer  (or 
classifier).  Recently,  the  back  propagation  neural  network  (BPNN)  for  classification  of  features  has  widely 
been  used  in  the  field  of  computer-aided  diagnosis  [  1 6-20] . 
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The  success  of  using  an  analyzer  for  a  pattern  recognition  task  would  rely  on  two  issues:  (a)  selected 
features  that  could  describe  discrepancy  between  patterns  and  (b)  accuracy  of  the  feature  computation.  Should 
either  one  fail,  no  analyzer  or  classifier  would  be  able  to  achieve  the  expected  performance.  By  analyzing  many 
clinical  samples  of  various  sizes  of  masses,  we  found  that  the  peripheral  portion  of  the  mass  plays  an  important 
role  for  mammographers  to  make  a  diagnosis.  The  mammographer  usually  evaluates  the  surrounding 
background  of  a  radiodense  area  when  breast  cancer  is  suspected. 

We,  therefore,  performed  boundary  detection  of  the  suspected  masses  on  the  morphologically  enhanced 
mammogram.  A  region  growing  with  valley  blocking  technique  was  employed  to  delineate  all  the  suspected 
areas.  Then,  the  boundary  was  divided  into  36  sectors  (i.e.,  10°  per  sector)  using  36  equi-angle  dividers 
radiated  from  the  center  of  suspicious  area.  The  following  features  were  computed  within  each  10°  sector  of  the 
area: 

(a)  "1"  -  the  length  from  the  center  of  mass  to  the  shortest  boundary  segment. 

(b)  "a"  -  the  normal  angle  of  the  boundary  segment  (or  the  value  of  cos(a)). 

(c)  "g"  -  the  average  gradient  of  gray  value  on  the  segment  along  the  radial  direction. 

Technically  speaking,  this  set  of  gradient  values  may  also  serve  as  a  fuzzy  system  for  the  input  layer  in 
the  neural  network  to  be  described. 

(d)  "c"  -  the  gray  value  difference  (i.e.,  contrast)  along  the  radial  direction.  Averaged  gray  value  (hi)  calculated 

from  the  mass  area  located  at  "173  inside  the  boundary  and  the  average  background  value  (b0) 
calculated  from  the  peripheral  area  near  "173  outside  of  the  suspicious  area). 

Hence,  a  total  of  144  computed  features  (4  features/sector  for  36  sectors)  can  be  used  as  input  values  for  the 
analysis  of  suspicious  areas.  The  relationship  between  the  computed  features  and  BI-RADS  descriptors  are 
discussed  below: 

(1)  Mass  Size  - 

The  36  "1"  values  would  provide  sufficient  data  for  the  neural  network  to  determine  the  size. 

(2)  Mass  Shape  (round,  oval,  lobulated,  or  irregular)  - 

The  36  "1"  and  36  "a"  values  could  approximate  the  shape  of  a  mass. 

(3)  Mass  Margin  (circumscribed,  microlobulated,  obscured,  ill-defined,  or  spiculate)  - 

The  36  "g"  and  36  "1"  values  should  be  able  to  describe  the  characteristics  of  the  mass  margin. 

(4)  Mass  Density  (fat-containing,  low  density,  isodense,  or  highly  dense)  - 

The  36  "c"  and  36  "g"  values  would  be  able  to  describe  the  density  of  the  mass. 

In  short,  the  selected  features  are  greatly  associated  with  the  main  mass  descriptors  indicated  in  the  BI-RADS. 
The  reason  for  using  36  values  for  each  nominated  feature  is  four-fold:  (a)  mass  boundary  varies,  it  is  difficult 
to  describe  an  image  pattern  using  a  single  value;  (b)  due  to  the  general  shape  of  the  masses,  the  features  of 
masses  can  be  easily  analyzed  by  the  polar  coordinate  system;  (c)  in  case  some  features  are  inaccurately 
computed  in  several  directions  due  to  the  structure  noises,  such  as  the  breast  slender  lines,  there  may  still  exist  a 
sufficient  number  of  correct  features;  (d)  generally  more  accurate  results  can  be  produced  by  using  subdivided 
parameters  rather  than  using  global  parameters  in  a  pattern  recognition  task.  Other  computational  features  (e.g., 
difference  entropy  [8]  and  other  higher  order  features)  are  eligible  but  require  further  investigation. 


3.3.  The  neural  network  structures  specifically  designed  for  the  extracted  boundary  features 

(A!  Multiple  paths  with  circular  networking  to  instruct  the  neural  network  in  analyzing  sector  features 

We  designed  several  neural  network  connections  between  the  input  and  the  first  hidden  layers  as  shown 
in  Figure  3.  Figure  3(A),  (B),  and  (C)  illustrate  the  full  connection,  a  self  correlation  (SC)  networking,  and  a 
neighborhood  correlation  (NC)  networking,  respectively.  Note  that  the  input  and  hidden  nodes  should  be 
completely  matched  when  combining  more  than  one  path  in  the  study.  In  this  case,  the  correlation  layers  only 
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function  as  branch  connections  between  input  and  hidden  layers.  When  using  NC  paths,  networking 
engagement  within  multiple  sectors  (e.g.,  20°,  30°,  40°,  and  50°  of  the  neighborhood  correlation)  can  be 
grouped.  The  method  of  using  the  multiple  correlation  connections  was  motivated  by  our  two-dimensional 
convolution  neural  network  (2-D  CNN)  research  experience  where  we  found  that  more  than  10  multiple 
convolution  kernels  were  necessary  to  archive  an  outstanding  neural  network  performance  in  the  detection  of 
lung  nodules  and  microcalcifications  [17]. 

Compared  to  2-D  CNN  systems,  the  required  computation  using  1-D  input  features  (i.e.,  144)  is 
relatively  small.  The  combination  of  the  networking  paths  described  earlier  for  MCPNN  was  implemented 
using  C  programming  language.  The  internal  computation  algorithm  used  in  the  MCPNN  shares  the  same 
convolution  process  as  that  in  the  2-D  CNN  [17].  One  additional  training  method  using  flipping  invariance  for 
1-D  convolution  kernels  was  employed  and  is  described  in  the  section  3.3.(B). 

The  fully  connected  neural  network  is  a  standard  back  propagation  neural  network.  The  signals  of  the 
fully  connected  neural  network  join  the  other  two  network  processes  (SC  and  NC  paths)  at  the  single  node  of 
the  output  layer.  The  signal  received  at  the  output  node  is  scaled  between  0  and  1 .  During  the  training,  0  and  1 
were  assigned  at  the  output  node  to  perform  back  propagation  computation  for  a  non-mass  and  a  mass, 
respectively.  The  back  propagation  was  computed  in  such  a  way  that  the  computed  incremental  errors  (see 
equations  (9)  and  (10)  were  retraced  into  three  independent  network  paths  (full-connected,  SC,  and  NC  paths). 
Besides  the  output  layer,  the  SC  and  NC  signals  were  independently  arranged  and  are  processed  through  two 
one-dimensional  convolution  processes  in  the  forward  propagation.  The  learning  algorithms  for  all  three  paths 
were  based  on  the  backpropagation  training  method. 

Let  N°(n,  s)  represents  input  signal  at  the  node  n  and  sector  s.  The  signal  received  at  each  node  on  the 
first  hidden  layer  of  the  SC  path  is 


where  bsc(s)  represents  the  bias  in  sth  sector.  The  signal  gets  into  each  node  on  the  first  hidden  layer  of  the  NC 
path  is 


where  b(s)  represents  the  bias  in  sth  sector  and  si  is  2  to  cover  —20  degree  to  20  degrees  of  the  fan.  The  signals 
in  other  hidden  layers  in  each  path  are  processed  the  same  as  the  standard  fully  connected  neural  network.  The 
output  signal  was  collected  from  the  last  hidden  layer  and  is  given  by, 

0  =  S(N‘(n)W'(n)>,  (6) 


where  /  denotes  the  hidden  layer,  p  denotes  the  path,  and  S(z)  is  a  sigmoid  function  given  by 

S(*)  =  —  .  (7) 

1  +  exp(-  z ) 

The  sigmoid  function  would  produce  modulated  values  ranging  from  0  to  1. 

Let  the  t-th  change  of  the  weight  be  AJ0p(ri)  and  the  t-th  change  of  the  bias  be  Ablp(t).  The  error 

function  is  defined  as 

E  =  ^(T-0f  (8) 
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where  T  and  O  denote  the  desired  output  value  and  the  actual  output  value,  respectively  when  the  input  nodes 
N°(n,s),  are  entered  in  the  network.  In  this  model,  the  error  back  propagation  algorithm^  which  updates  the 
kernel  weights,  was  given  below: 


A»>  +  1]  =  vtZK"  »>.'S))+  aAW'r  [r]  (9) 

A*;['+1]  =  9SZ<5;(»,s)+<^6;m  (10) 

S'p{n,s)=NLp(n,Sis';\n,s)-W^n,s))  (11) 

where  s  =  0  when  1*0.  In  the  case  of  the  last  layer, 

%=s'(ivfr)Xr-o)  (12) 


where  S’  (z),  7,  a,  and  T  denote  the  derivative  of  S(z),  the  learning  rate,  the  weighting  factor  contributed  by  the 
momentum  term,  and  the  desired  output  image,  respectively. 

During  the  training,  we  added  an  isotropic  constraint  to  the  weights  in  the  1-D  convolution  kernels  and 

so  that 


W°p(n,-s)  =  W°p(n,s)  (13) 

where  p  is  not  the  fully  connected  path.  These  additional  constraints  were  used  to  induce  the  kernels 
functioning  as  correlation  processing  filters  and  could  facilitate  the  algorithm  in  searching  for  an  appropriate 
linear  filter. 


(Bl  Training  methods  and  the  utilization  of  characteristics  of  flipping  invariance  of  the  features 

Because  we  used  the  circular  paths,  there  were  no  starting  and  ending  sectors.  The  forward  and  back 
propagation  computation  can  be  started  from  any  sector.  Since  the  mass  characteristics  of  the  flipped  patch 
remained  the  same,  we  flipped  each  patch  in  the  training  set  and  kept  the  same  numerical  value  for  the  target 
output. 

Since  we  designed  a  10°  increment  for  each  rotation,  each  SC  or  NC  networking  would  need  to  process 
through  36  times  for  the  computed  feature  set  for  each  image  patch.  To  simplify  this  network  computation,  we 
shifted  one  small  set  (4  nodes)  on  the  input  layer  a  time  to  conduct  the  circular  convolution  process  with  the  SC 
and  NC  kernels.  By  reversing  the  sequence  of  the  sector,  we  can  train  the  flipped  version  of  the  suspicious 
masses.  Hence,  the  characteristics  of  flipping  invariance  literally  increase  the  number  of  the  training  set  by  a 
factor  of  2. 
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Figure  3.  Three  types  of  network  paths  connecting  the  input  and  the  hidden  layers: 

(A)  Full  connection. 

(B)  A  self  correlation  (SC)  path;  each  node  on  the  layer  connects  to  a  single  set  of  the  features  (l,a,g,c)  for  the 
fan-in  and  fully  connects  to  the  hidden  nodes  for  fan-out. 

(C)  A  neighborhood  correlation  (NC)  path;  each  node  on  the  layer  connects  to  five  adjacent  sets  of  the  features 
for  the  fan-in  and  fully  connects  to  the  hidden  nodes  for  fan-out. 

Note  that  the  fan-in  nets  emphasizing  self  correlation  in  (B~)  and  neighborhood  correlation  in  (Q  represent 
convolution  weights  (i.e.,  the  same  type  of  sectors  possess  the  same  set  of  weighting  factors). 


3.4.  Summary  of  feature  arrangement  methods  and  the  MCPNN 

We  have  described  our  approach  on  the  feature  extraction,  the  design  of  MCPNN,  and  its  corresponding 
training  method.  Figure  4  shows  a  flow  diagram  of  the  proposed  method.  Since  the  MCPNN  only  alters  the 
input  data  connection  from  the  input  to  the  first  hidden  layer,  any  learning  algorithm  can  be  applied  within  the 
neural  network.  For  simplicity,  we  used  the  back  propagation  algorithm  for  both  the  conventional  and  proposed 
neural  network  systems  in  the  following  experiments. 
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Figure  4.  A  schematic  diagram,  showing  the  MCPNN  and  sector  features  of  masses,  was  used  in  the  following 
study. 


4.  Experiments  and  Results 

We  selected  91  mammograms  and  digitized  each  mammogram  with  a  computer  format  of  2048x 
2500x12  bits  (for  an  8"xll"  area  where  each  image  pixel  represents  100  pm  square).  No  two  mammograms 
were  selected  from  the  same  patient  film  jacket.  All  the  digitized  mammograms  were  miniaturized  to 
512x625x12  bits  using  4X4  pixel  averaging  and  were  processed  by  the  above  methods  to  perform  mass 
detection.  Based  on  the  corresponding  biopsy  reports,  one  experienced  radiologist  read  all  91  mammograms 
and  identified  75  areas  containing  masses.  (Note  that  the  reports  recorded  the  malignancy  of  the  biopsy 
specimens.  The  radiologist  only  used  them  as  reference  for  the  identification  of  masses.)  Through  the  pre- 
process  and  the  first  step  screen  based  on  the  circularity  test,  a  total  of  125  suspicious  areas  were  extracted  from 
the  91  digitized  mammograms. 


4.1.  Experiment  1 

We  randomly  selected  54  computer-segmented  areas  where  30  patches  were  matched  with  the 
radiologist’s  identification  and  24  were  not.  This  database  was  used  to  train  two  neural  network  systems:  (1)  a 
conventional  3-layer  BP  neural  network  (with  125  nodes  in  the  hidden  layer)  and  (2)  the  proposed  MCPNN 
training  method  using  the  same  neural  network  learning  algorithm.  The  structure  of  the  MCPNN  was  described 
earlier.  However,  we  used  one  fully  connected  path,  four  SC  paths,  four  20°  NC  paths,  four  30°  NC  paths,  three 
40°  NC  paths,  and  two  50°  NC  paths  in  the  first  step  network  connection  for  the  MCPNN.  All  paths  in  the 
neural  network  have  their  hidden  layers.  Only  one  hidden  layer  per  path  was  used.  Both  neural  network 
systems  were  trained  by  the  error  back  propagation  algorithm  by  feeding  the  features  from  the  input  layer  and 
registering  the  corresponding  target  value  at  the  output  side.  Once  the  training  of  the  neural  networks  was 
complete,  we  then  used  the  remaining  71  computer  segmented  areas  for  the  testing.  None  of  the  images  and 
their  corresponding  patients  in  the  testing  set  could  be  found  in  the  training  set.  The  neural  network  output 
values  were  fed  into  the  LABROC  program  [21]  for  the  performance  evaluation.  The  results  indicated  that  the 
areas  (Az)  under  the  receiving  operator  characteristic  (ROC)  curves  were  0.781  and  0.844  using  the 
conventional  BPNN  and  the  MCPNN,  respectively.  The  ROC  curves  of  these  two  neural  network  training 
methods  are  shown  in  Figure  5(A).  We  also  invited  another  senior  mammographer  to  conduct  an  ROC  observer 
study.  The  mammographer  was  asked  to  rate  each  patch  using  a  numerical  scale  ranging  0-10  for  its  likelihood 
of  being  a  mass.  These  71  numbers  were  also  fed  into  the  LABROC  program.  The  mammographer’s 
performance  in  Az  on  this  set  of  test  cases  was  0.909.  The  corresponding  ROC  curve  is  also  shown  in  Figure 
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5(A). 


4.2.  Experiment  2 

We  also  conducted  a  leave-one-case-out  experiment  using  the  same  database.  In  this  experiment,  we 
used  those  patches  extracted  from  90  mammograms  for  the  training  and  used  the  patches  (most  of  them  are 
single)  extracted  from  the  remaining  one  mammogram  as  test  objects.  The  procedure  was  repeated  91  times  to 
allow  every  suspicious  patch  from  each  mammogram  to  be  tested  in  the  experiment.  For  each  individual 
suspicious  area,  the  computed  features  were  identical  to  those  used  in  Experiment  1.  Again,  both  neural 
network  systems  were  independently  evaluated  with  the  same  procedure.  The  results  indicated  that  the  Az 
values  were  0.799  and  0.887  using  the  conventional  back  propagation  neural  network  and  the  MCPNN, 
respectively.  Figure  5(B)  shows  the  ROC  curves  of  these  two  neural  network  systems  using  the  leave-one-of- 
out  procedure  in  the  experiment. 


ROC  Curves  of  The  Mammographer  and 
Two  Different  Neural  Network  Training 
Methods  in  Experiment  1. 


ROC  Curves  of  The  Two  Different 
Training  Methods  in  Experiment  2. 
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Figure  5.  The  ROC  curves  obtained  from  corresponding  experiments. 

(A)  The  left  figure  shows  that  the  performance  of  MCPNN  training  method  is  superior  to  that  of  the 
conventional  input  method.  The  highest  curve  is  the  ROC  performance  of  the  senior  mammographer. 

(B)  The  right  figure  shows  the  ROC  results  with  higher  performance  using  the  leave-one-case-out  procedure  as 
described  in  Experiment  2. 


5.  Conclusions  and  Discussion 

Through  this  study,  we  found  that  the  selected  features  are  somewhat  effective  in  the  detection  of 
masses.  These  features  were  “computationally  translated”  from  the  qualitative  descriptors  of  BI-RAD. 
Another  uniqueness  of  this  study  was  on  the  test  of  our  newly  developed  MCPNN  training  method.  In 
Experiment  1,  we  found  that  the  performances  of  both  neural  network  systems  were  increased.  This  might  be 
due  to  the  increased  number  of  cases  (from  54  to  124)  in  the  training  set.  In  Experiment  2,  the  Az  value  was 
improved  by  0.043  using  the  MCPNN  training  method  that  was  higher  than  Az  difference  of  0.018  obtained  by 
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the  conventional  training  method.  The  results  implied  that  the  MCPNN  learned  more  effectively  than  the 
conventional  BP  when  the  number  of  training  cases  was  increased. 

It  is  known  in  the  field  of  artificial  intelligence  that  the  key  factors  in  pattern  recognition  are:  (1) 
effective  methods  in  the  extraction  of  features  and  (2)  analytic  methods  (e.g.,  back  propagation  neural  network) 
for  the  extracted  features.  In  this  study,  we  showed  that  the  training  method  designed  to  guide  the  analyzer  is 
also  an  important  factor  to  a  success  of  a  pattern  recognition  task.  Though  this  finding  is  not  new,  the  research 
of  developing  training  methods  for  various  pattern  recognition  tasks  has  not  established  in  the  field  of  medical 
imaging.  In  this  work,  we  demonstrated  that  organized  features  with  proper  network  connection  and  task- 
oriented  guidance  would  assist  the  neural  network  in  performing  the  task. 

As  far  as  the  research  in  recognition  of  masses  is  concerned,  we  believe  that  main  concept  of  using 
sectors  is  an  effective  approach.  Note  that  any  features  arranged  in  the  polar  coordinate  system  can  be 
trained  by  the  MCPNN  method.  Since  the  MCPNN  only  coordinates  the  input  data,  the  internal  neural 
network  learning  algorithm  can  be  changed  to  other  learning  algorithms.  A  technique  using  the  rubber  band 
straightening  transformation,  independently  developed  by  Sahnier  et  al.  [22],  for  the  detection  of  masses  also 
employs  a  similar  concept  in  extracting  feature  and/or  texture  in  the  polar  coordinate.  We  believe  that 
integration  of  effective  feature  and  texture  values  computed  at  small  sectors  will  be  the  research  trend  in  mass 
detection.  This  paper  focuses  on  neural  network  design  and  arrangement  of  features  for  effective  pattern 
recognition  of  masses  in  medical  imaging. 
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